Sasha Sasha - 3 months ago 5
R Question

Extracting a string between other two strings in R

I am trying to find a simple way to extract an unknown substring (could be anything) that appear between two known substrings. For example, I have a string:

a<-" anything goes here, STR1 GET_ME STR2, anything goes here"


I need to extract the string
GET_ME
which is between STR1 and STR2 (without the white spaces).

I am trying
str_extract(a, "STR1 (.+) STR2")
, but I am getting the entire match

[1] "STR1 GET_ME STR2"


I can of course strip the known strings, to isolate the substring I need, but I think there should be a cleaner way to do it by using a correct regular expression.

Answer

You may use str_match with STR1 (.*?) STR2. If you have multiple occurrences, use str_match_all.

> library(stringr)
> a<-" anything goes here, STR1 GET_ME STR2, anything goes here"
> res <- str_match(a, "STR1 (.*?) STR2")
> res[,2]
[1] "GET_ME"

Another way using base R regexec (to get the first match):

> test = " anything goes here, STR1 GET_ME STR2, anything goes here STR1 GET_ME2 STR2"
> pattern="STR1 (.*?) STR2"
> result <- regmatches(test,regexec(pattern,test))
> result[[1]][2]
[1] "GET_ME"