Daniel Dickison Daniel Dickison - 1 year ago 52
R Question

Regex group capture in R with multiple capture-groups

In R, is it possible to extract group capture from a regular expression match? As far as I can tell, none of

grep
,
grepl
,
regexpr
,
gregexpr
,
sub
, or
gsub
return the group captures.

I need to extract key-value pairs from strings that are encoded thus:

\((.*?) :: (0\.[0-9]+)\)


I can always just do multiple full-match greps, or do some outside (non-R) processing, but I was hoping I can do it all within R. Is there's a function or a package that provides such a function to do this?

Answer Source

str_match(), from the stringr package, will do this. It returns a character matrix with one column for each group in the match (and one for the whole match):

> s = c("(sometext :: 0.1231313213)", "(moretext :: 0.111222)")
> str_match(s, "\\((.*?) :: (0\\.[0-9]+)\\)")
     [,1]                         [,2]       [,3]          
[1,] "(sometext :: 0.1231313213)" "sometext" "0.1231313213"
[2,] "(moretext :: 0.111222)"     "moretext" "0.111222"