E.K. -4 years ago 168

Python Question

I want to delete the values that are greater than a certain threshold from a pandas dataframe. Is there an efficient way to perform this? I am doing it with apply and lambda, which works fine but a bit slow for a large dataframe and I feel like there must be a better method.

`df = pd.DataFrame({'A': [1,2,3,4,5], 'B': [1,2,3,4,5]})`

df

A B

0 1 1

1 2 2

2 3 3

3 4 4

4 5 5

How can this be done without apply and lambda?

`df['A'] = df.apply(lambda x: x['A'] if x['A'] < 3 else None, axis=1)`

df

A B

0 1.0 1

1 2.0 2

2 NaN 3

3 NaN 4

4 NaN 5

Recommended for you: Get network issues from **WhatsUp Gold**. **Not end users.**

Answer Source

Use a boolean mask against the df:

```
In[21]:
df[df<3]
Out[21]:
A
0 1.0
1 2.0
2 NaN
3 NaN
4 NaN
```

Here where the boolean condition is not met a `False`

is returned, this will just mask out the df value returning `NaN`

If you actually want to drop these rows then self-assign:

```
df = df[df<3]
```

To compare a specific column:

```
In[22]:
df[df['A']<3]
Out[22]:
A
0 1
1 2
```

If you want `NaN`

in the removed rows then you can use a trick where a double square brackets will return a single column df so we can mask the df:

```
In[25]:
df[df[['A']]<3]
Out[25]:
A
0 1.0
1 2.0
2 NaN
3 NaN
4 NaN
```

If you have multiple columns then the above won't work as the boolean mask has to match the orig df, in which case you can `reindex`

against the orig df index:

```
In[31]:
df = pd.DataFrame({'A': [1,2,3,4,5], 'B': [1,2,3,4,5]})
df[df['A']<3].reindex(df.index)
Out[31]:
A B
0 1.0 1.0
1 2.0 2.0
2 NaN NaN
3 NaN NaN
4 NaN NaN
```

**EDIT**

You've updated your question again, if you want to just overwrite the single column:

```
In[32]:
df = pd.DataFrame({'A': [1,2,3,4,5], 'B': [1,2,3,4,5]})
df['A'] = df.loc[df['A'] < 3,'A']
df
Out[32]:
A B
0 1.0 1
1 2.0 2
2 NaN 3
3 NaN 4
4 NaN 5
```

Recommended from our users: **Dynamic Network Monitoring from WhatsUp Gold from IPSwitch**. ** Free Download**

Latest added