R Question

Named capture in regexp

I need the ability to capture groups in regular expressions using names in r. I test the code explained in this site [Rd] Named capture in regexp and the example works without problem. I try to adapt this code to solve simple regular expression.


For more details see the code here

I try to do it in r

regex = "(xxxx) (?<id>[0-9A-Za-z]{4}) (?<number>[0-9]{5})"
notable = "xxxxcn0700814"
regexpr(regex,notable,perl = TRUE)

and it was my output for this code

[1] -1
[1] -1
[1] TRUE
id number
[1,] -1 -1 -1
id number
[1,] -1 -1 -1
[1] "" "id" "number"

I can see what is the problem with this because this code is similar to the code of web page.

Thanks in advance

Answer Source

If you want to make the whitespace in the PCRE regex formatting, just use the (?x) inline modifier:

regex =  "(?x)(xxxx) (?<id>[0-9A-Za-z]{4}) (?<number>[0-9]{5})"

See the R online demo

See more about VERBOSE / COMMENT / IgnorePatternWhitespace modifier in the Documentation.

If you want to match a literal space with this modifier, you will have to escape it, or use inside a character class. If you need to match any whitespace, use \s shorthand.

If you do not need all these "prettifying" stuff, just remove the spaces from your pattern since without (?x) they are meaningful:

regex =  "(xxxx)(?<id>[0-9A-Za-z]{4})(?<number>[0-9]{5})"
