mcglashan mcglashan - 4 months ago 33
Python Question

Select rows containing certain values from pandas dataframe

I have a pandas dataframe whose entries are all strings:

A B C
1 apple banana pear
2 pear pear apple
3 banana pear pear
4 apple apple pear


etc. I want to select all the rows that contain a certain string, say, 'banana'. I don't know which column it will appear in each time. Of course, I can write a for loop and iterate over all rows. But is there an easier or faster way to do this?

Answer

You can create a boolean mask from comparing the entire df against your string and call dropna passing param how='all' to drop rows where your string doesn't appear in all cols:

In [59]:
df[df == 'banana'].dropna(how='all')

Out[59]:
        A       B    C
1     NaN  banana  NaN
3  banana     NaN  NaN

To test for multiple values you can use multiple masks:

In [90]:
banana = df[(df=='banana')].dropna(how='all')
banana

Out[90]:
        A       B    C
1     NaN  banana  NaN
3  banana     NaN  NaN

In [91]:    
apple = df[(df=='apple')].dropna(how='all')
apple

Out[91]:
       A      B      C
1  apple    NaN    NaN
2    NaN    NaN  apple
4  apple  apple    NaN

You can use index.intersection to index just the common index values:

In [93]:
df.loc[apple.index.intersection(banana.index)]

Out[93]:
       A       B     C
1  apple  banana  pear
Comments