mcglashan mcglashan - 1 year ago 170
Python Question

Select rows containing certain values from pandas dataframe

I have a pandas dataframe whose entries are all strings:

1 apple banana pear
2 pear pear apple
3 banana pear pear
4 apple apple pear

etc. I want to select all the rows that contain a certain string, say, 'banana'. I don't know which column it will appear in each time. Of course, I can write a for loop and iterate over all rows. But is there an easier or faster way to do this?

Answer Source

You can create a boolean mask from comparing the entire df against your string and call dropna passing param how='all' to drop rows where your string doesn't appear in all cols:

In [59]:
df[df == 'banana'].dropna(how='all')

        A       B    C
1     NaN  banana  NaN
3  banana     NaN  NaN

To test for multiple values you can use multiple masks:

In [90]:
banana = df[(df=='banana')].dropna(how='all')

        A       B    C
1     NaN  banana  NaN
3  banana     NaN  NaN

In [91]:    
apple = df[(df=='apple')].dropna(how='all')

       A      B      C
1  apple    NaN    NaN
2    NaN    NaN  apple
4  apple  apple    NaN

You can use index.intersection to index just the common index values:

In [93]:

       A       B     C
1  apple  banana  pear