M. Siwik - 2 months ago 6x
R Question

# How to match regular expression exacly in R and pull out pattern

I want to get pattern from my vector of strings

``````   string <- c(
"P10000101 - Przychody netto ze sprzedazy produktów" ,
"P10000102_PL - Przychody nettozy uslug",
"P1000010201_PL - Handlowych, marketingowych, szkoleniowych",
"P100001020101 - - Handlowych,, szkoleniowych - refaktury",
"- Handlowych, marketingowych,P100001020102, - pozostale"
)
``````

As result i want to get exact match of regular expression

``````result <- c(
"P10000101",
"P10000102_PL",
"P1000010201_PL",
"P100001020101",
"P100001020102"
)
``````

I tried with this
`pattern = "([PLA]\\d+)"`
and diffrent combination of
`value = T, fixed = T, perl = T.`

``````grep(x = string, pattern = "([PLA]\\d+(_PL)?)", fixed = T)
``````

We can try with `str_extract`

``````library(stringr)
str_extract(string, "P\\d+(_[A-Z]+)*")
#[1] "P10000101"      "P10000102_PL"   "P1000010201_PL" "P100001020101"  "P100001020102"
``````

`grep` is for finding whether the match pattern is present in a particular string or not. For extraction, either use `sub` or `gregexpr/regmatches` or `str_extract`

Using the `base R` (`regexpr/regmatches`)

``````regmatches(string, regexpr("P\\d+(_[A-Z]+)*", string))
#[1] "P10000101"      "P10000102_PL"   "P1000010201_PL" "P100001020101"  "P100001020102"
``````

Basically, the pattern to match is `P` followed by one more numbers (`\\d+`) followed by greedy (`*`) match of `_` and one or more upper case letters.