E.K. E.K. -4 years ago 168
Python Question

Delete values from pandas dataframe based on logical operation

I want to delete the values that are greater than a certain threshold from a pandas dataframe. Is there an efficient way to perform this? I am doing it with apply and lambda, which works fine but a bit slow for a large dataframe and I feel like there must be a better method.

df = pd.DataFrame({'A': [1,2,3,4,5], 'B': [1,2,3,4,5]})
df

A B
0 1 1
1 2 2
2 3 3
3 4 4
4 5 5


How can this be done without apply and lambda?

df['A'] = df.apply(lambda x: x['A'] if x['A'] < 3 else None, axis=1)
df


A B
0 1.0 1
1 2.0 2
2 NaN 3
3 NaN 4
4 NaN 5

Answer Source

Use a boolean mask against the df:

In[21]:
df[df<3]

Out[21]: 
     A
0  1.0
1  2.0
2  NaN
3  NaN
4  NaN

Here where the boolean condition is not met a False is returned, this will just mask out the df value returning NaN

If you actually want to drop these rows then self-assign:

df = df[df<3]

To compare a specific column:

In[22]:
df[df['A']<3]

Out[22]: 
   A
0  1
1  2

If you want NaN in the removed rows then you can use a trick where a double square brackets will return a single column df so we can mask the df:

In[25]:
df[df[['A']]<3]

Out[25]: 
     A
0  1.0
1  2.0
2  NaN
3  NaN
4  NaN

If you have multiple columns then the above won't work as the boolean mask has to match the orig df, in which case you can reindex against the orig df index:

In[31]:
df = pd.DataFrame({'A': [1,2,3,4,5], 'B': [1,2,3,4,5]})
df[df['A']<3].reindex(df.index)

Out[31]: 
     A    B
0  1.0  1.0
1  2.0  2.0
2  NaN  NaN
3  NaN  NaN
4  NaN  NaN

EDIT

You've updated your question again, if you want to just overwrite the single column:

In[32]:
df = pd.DataFrame({'A': [1,2,3,4,5], 'B': [1,2,3,4,5]})
df['A'] = df.loc[df['A'] < 3,'A']
df

Out[32]: 
     A  B
0  1.0  1
1  2.0  2
2  NaN  3
3  NaN  4
4  NaN  5
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download