Ty Voss Ty Voss - 11 months ago 76
R Question

r grepl and filter dataframe

I am dealing with a dataframe that contains two column with the following values

Col1 Col2
10 How to; bus; car;
11 How to;
12 How to
13 How to; bus
14 How to; car

What I am trying to do is filter the dataframe such that only rows that contain values like
How to
How to;
is retained and rest are discarded. So the final dataframe should look like this below

Col1 Col2
11 How to;
12 How to

This is what I tried.

filter(df, grepl('How to;|How to', Col2))

This is not working, its showing the entire dataframe. Not sure where I am going wrong. Any help is much appreciated.

Answer Source

I think the comments have provided an adequate answer, however, I thought I'd give you an answer more close to your original question.

df %>% filter(!(grepl('bus', .$Col2) | grepl('car', .$Col2)))

Notice the various differences. First in your example the or operator | appears inside the pattern. This mean R is literally looking for 'How to;|How to' not 'How to;' or 'How to'. Second notice how I append .$ to the column name. When utilizing dplyr the . is shorthand for the data you've passed. Therefore df$Col2 would have also worked. You need this because you are passing the argument into a base R function and not a dplyr function. Finally, the code df %>% filter((grepl('How to', .$Col2) | grepl('How to:', .$Col2))) would not have worked, because grepl does not find exact matches. Rather, it finds instances containing the pattern. You can find exact match, but you need to utilize regex metacharacters. Therefore df %>% filter((grepl('How to', .$Col2) | grepl('How to:', .$Col2))) would return the whole data set, i.e. your current output.