I want to extract all substrings that begin with M and are terminated by a *
The string below as an example;
vec<-c("SHVANSGYMGMTPRLGLESLLE*A*MIRVASQ")
Would ideally return;
MGMTPRLGLESLLE
MTPRLGLESLLE
I have tried the code below;
regmatches(vec, gregexpr('(?<=M).*?(?=\\*)', vec, perl=T))[[1]]
but this drops the first M and only returns the first string rather than all substrings within.
"GMTPRLGLESLLE"
You can use
See the regex demo. Details:
(?=- start of a positive lookahead that matches a location that is immediately followed with:(M[^*]*)- Group 1:M, zero or more chars other than a*char\*- a*char)- end of the lookahead.See the R demo:
If you prefer a base R solution: