mcglashan - 9 months ago 92

Python Question

I have a pandas dataframe whose entries are all strings:

`A B C`

1 apple banana pear

2 pear pear apple

3 banana pear pear

4 apple apple pear

etc. I want to select all the rows that contain a certain string, say, 'banana'. I don't know which column it will appear in each time. Of course, I can write a for loop and iterate over all rows. But is there an easier or faster way to do this?

Answer

You can create a boolean mask from comparing the entire df against your string and call `dropna`

passing param `how='all'`

to drop rows where your string doesn't appear in all cols:

```
In [59]:
df[df == 'banana'].dropna(how='all')
Out[59]:
A B C
1 NaN banana NaN
3 banana NaN NaN
```

To test for multiple values you can use multiple masks:

```
In [90]:
banana = df[(df=='banana')].dropna(how='all')
banana
Out[90]:
A B C
1 NaN banana NaN
3 banana NaN NaN
In [91]:
apple = df[(df=='apple')].dropna(how='all')
apple
Out[91]:
A B C
1 apple NaN NaN
2 NaN NaN apple
4 apple apple NaN
```

You can use `index.intersection`

to index just the common index values:

```
In [93]:
df.loc[apple.index.intersection(banana.index)]
Out[93]:
A B C
1 apple banana pear
```