Kostas Belivanis Kostas Belivanis - 1 month ago 15
Python Question

Filter down dataframe

I want to filter down a dataframe. Trying to use the standard boolean (multiple) expressions but it doesn't work. My code is:

import pandas as pd
import numpy as np

# Setting a dataframe
dates = pd.date_range('1/1/2000', periods=10)

df1 = pd.DataFrame(np.random.randn(10, 4), index=dates, columns=['A', 'B', 'C', 'D'])

# Filtering
df1 = df1.loc[lambda df: -0.5 < df1.A < 0 and 0 <= df1.B <= 1, :]


Any thoughts on it?

Answer

No need for the anonymous lambda function. Simply filter with a boolean statement. Also, notice the use of the bitwise operator, &, not boolean operator, and. Below are equivalent variants to filtering:

df1 = df1.query('(A > -0.5) & (A < 0) & (B >= 0) & (B <= 1)', engine='python'))

df1 = df1.loc[(df1.A > -0.5) & (df1.A < 0) & (df1.B >= 0) & (df1.B <= 1)]

df1 = df1[(df1.A > -0.5) & (df1.A < 0) & (df1.B >= 0) & (df1.B <= 1)]

Consider even using pandas.Series.between:

df1 = df1.loc[(df1.A.between(-0.5, 0, inclusive=False) & (df1.B.between(0, 1))]