Ben Ben - 24 days ago 13
R Question

regex to detect string separated by non alphabet characters (or nothing)

I'd like to write a regex to detect the string "el" (stands for "eliminated" and is inside a bunch of poorly formatted score data).

For example

tests <- c("el", "hello", "123el", "el/27")


Here I'm looking for the result
TRUE, FALSE, TRUE, TRUE
. My sad attempts which don't work for obvious reasons:

library(stringr)
str_detect(tests, "el") # TRUE TRUE TRUE TRUE
str_detect(tests, "[^a-z]el") # FALSE FALSE TRUE FALSE

Answer

Use the regex (\\b|[^[:alpha:]])el(\\b|[^[:alpha:]]) along with grepl:

> tests <- c("el", "hello", "123el", "el/27")
> y <- grepl("(\\b|[^[:alpha:]])el(\\b|[^[:alpha:]])", tests)
> y
[1]  TRUE FALSE  TRUE  TRUE

Your condition for whether el appears as an entity is that both sides either have a word boundary (\b) or a non alpha character (represented by the character class [^[:alpha:]] in R).