GregS GregS - 2 months ago 11
Perl Question

Regular expression to grab word before a certain character R Perl

I need to get words before and after a unique character (in my case: &) in a string in R.

I need to get 'word1' from something like this:
"...something something word1 & word2 something..."

I can get the word after using a Perl regular expression in R:

(?<=& )[^ ]*(?= )

(It seems to behave the way I would like. I got it from combing answers I found on this site)

I now need to get the word preceding the
symbol. The length of the words change and the number of other preceding words, and also spaces, change. Word one could be letters and numbers, just bound by spaces on either side.


If you use (\S+)\s*&\s*(\S+) then the words both sides of & will be captured. This allows for optional whitespace around the ampersand.

You need to double-up the backslashes in an R string, and use the regexec and regmatches functions to apply the pattern and extract the matched substrings.

string  <- "...something something word1 & word2 something..."
pattern <- "(\\S+)\\s*&\\s*(\\S+)"
match   <- regexec(pattern, string)
words   <- regmatches(string, match)

Now words is a one-element list holding a three-item vector: the whole matched string followed by the first and second backreferences. So words[[1]][2] is word1 and words[[1]][3] is word2.