I need to get words before and after a unique character (in my case: &) in a string in R.
I need to get 'word1' from something like this:
"...something something word1 & word2 something..."
I can get the word after using a Perl regular expression in R:
(?<=& )[^ ]*(?= )
If you use
(\S+)\s*&\s*(\S+) then the words both sides of
& will be captured. This allows for optional whitespace around the ampersand.
You need to double-up the backslashes in an R string, and use the
regmatches functions to apply the pattern and extract the matched substrings.
string <- "...something something word1 & word2 something..." pattern <- "(\\S+)\\s*&\\s*(\\S+)" match <- regexec(pattern, string) words <- regmatches(string, match)
words is a one-element list holding a three-item vector: the whole matched string followed by the first and second backreferences. So