King King - 1 month ago 7
Python Question

Pandas filter mulitple columns with single criteria

I have an excel sheet with over a hundred columns. I need to filter about five of these to see which columns have "no" in one of the cells. Is there a way to filter multiple columns with a single search criteria such as:

no_invoice_filter = df[(df['M1: PL - INVOICED']) & (df['M2: EX - INVOICED']) & (df['M3: TEST DEP - INVOICED']) == 'No']


As oppossed to seperately writing out if each column equals "no"

error for the code above:

TypeError: unsupported operand type(s) for &: 'str' and 'bool'

Answer

You can do:

df[(df[['M1: PL - INVOICED','M2: EX - INVOICED','M3: TEST DEP - INVOICED']] == 'No')]

So you essentially pass a list of the cols of interest and compare just these columns against your scalar value, if you're after 'No' appearing anywhere then use any(axis=1)

In [115]:
df = pd.DataFrame({'a':'no', 'b':'yes', 'c':['yes','no','yes','no','no']})
df

Out[115]:
    a    b    c
0  no  yes  yes
1  no  yes   no
2  no  yes  yes
3  no  yes   no
4  no  yes   no

With any(axis=1) then it returns all rows where No appears in any of the cols of interest:

In [133]:    
df[(df[['a','c']] == 'no').any(axis=1)]

Out[133]:
    a    b    c
0  no  yes  yes
1  no  yes   no
2  no  yes  yes
3  no  yes   no
4  no  yes   no

You can also use the mask to drop NaN rows for a specific column using dropna

In [132]:    
df[df[['a','c']] == 'no'].dropna(subset=['c'])

Out[132]:
    a    b   c
1  no  NaN  no
3  no  NaN  no
4  no  NaN  no