Connor M Connor M - 3 months ago 12
R Question

Regex (ICU) for matching between parentheses

Looking for some regex which will create a capture group for words occurring within parentheses, ignoring the parentheses themselves. The regex must be either PCRE or ICU.

Input:

( lakshd asd___ asa1123 Name : _____)


Desired Output:
Name


What I've tried:

\\((Name|name|NAME)\\)


(?<=\\()name|Name|NAME(?=\\))


\\(name|Name|NAME\\)

Answer

What I've tried:

\\((Name|name|NAME)\\)
(?<=\\()name|Name|NAME(?=\\))
\\(name|Name|NAME\\)

All these patterns look for name or Name or NAME that has a ( immediately before and ) right after, with difference being what is captured or returned as a match. To match some word inside parentheses, you need to use \([^()]* before the value you need to get, and [^()]*\) after it.

Also, there is no point in extracting something you already know.

So, if you plan to extract the last word from the parentheses, you may use

> library(stringr)
> s = "( lakshd  asd___ asa1123 Name : _____)"
> res <- str_match(s, "(?i)\\([^()]*\\b([a-z]\\w*)\\b[^()]*\\)")
> res[,2]
[1] "Name"

Note that str_match allows accessing captured values.

The (?i)\\([^()]*\\b([a-z]\\w*)\\b[^()]*\\) pattern matches parentheses and the last whole word from it.

Comments