Roman Roman -4 years ago 40
R Question

Search text by patter - Regular Expressions

I am trying to find some text patterns within my database.
I have a column with job titles (Data Analyst, Data Scientist etc.) and I'm trying to find all records with certain Job Title. I've been using following code:

grepl(".*Data.*Analyst.*, data$jobtitle, ignore.case = T)


It works very well, however it doesn't cover the opposite order of keywords -
"analyst data" or "Scientist Data".

Ideally I would love to search for "Data" and "Analyst" regardless the locations of the keywords in the title..

Answer Source

If you want to check for the presence of both keywords "data" and "analyst" in any order, you can use positive lookaheads:

grepl("(?=.*analyst)(?=.*data)",data$jobtitle,perl=T,ignore.‌​case=T)

This will return true if both words are present, regardless of their order or the presence of other words:

grepl("(?=.*analyst)(?=.*data)",c("Analyst data","Data Analyst","Data scientist","Analyst Science data"),perl=T,ignore.case=T)
[1]  TRUE  TRUE FALSE  TRUE
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download