Kostas Belivanis - 8 months ago 61

Python Question

I want to filter down a dataframe. Trying to use the standard boolean (multiple) expressions but it doesn't work. My code is:

`import pandas as pd`

import numpy as np

# Setting a dataframe

dates = pd.date_range('1/1/2000', periods=10)

df1 = pd.DataFrame(np.random.randn(10, 4), index=dates, columns=['A', 'B', 'C', 'D'])

# Filtering

df1 = df1.loc[lambda df: -0.5 < df1.A < 0 and 0 <= df1.B <= 1, :]

Any thoughts on it?

Answer

No need for the anonymous lambda function. Simply filter with a boolean statement. Also, notice the use of the bitwise operator, `&`

, not boolean operator, `and`

. Below are equivalent variants to filtering:

```
df1 = df1.query('(A > -0.5) & (A < 0) & (B >= 0) & (B <= 1)', engine='python'))
df1 = df1.loc[(df1.A > -0.5) & (df1.A < 0) & (df1.B >= 0) & (df1.B <= 1)]
df1 = df1[(df1.A > -0.5) & (df1.A < 0) & (df1.B >= 0) & (df1.B <= 1)]
```

Consider even using pandas.Series.between:

```
df1 = df1.loc[(df1.A.between(-0.5, 0, inclusive=False) & (df1.B.between(0, 1))]
```

Source (Stackoverflow)