I am trying to use R to parse through a number of entries. I have two requirements for the the entries I want back. I want all the entries that contain the word apple but don't contain the word orange.
Using a regular expression, you could do the following.
x <- c('I like apples', 'I really like apples', 'I like apples and oranges', 'I like oranges and apples', 'I really like oranges and apples but oranges more') x[grepl('^((?!.*orange).)*apple.*$', x, perl=TRUE)] #  "I like apples" "I really like apples"
The regular expression looks ahead to see if there's no character except a line break and no substring
orange and if so, then the dot
. will match any character except a line break as it is wrapped in a group, and repeated (
0 or more times). Next we look for
apple and any character except a line break (
0 or more times). Finally, the start and end of line anchors are in place to make sure the input is consumed.
UPDATE: You could use the following if performance is an issue.
x[grepl('^(?!.*orange).*$', x, perl=TRUE)]