oscarafone oscarafone - 1 month ago 6
Python Question

Apply same selection (cut) to multiple dataframes

My question is about making selections in

pandas
(python.)

As you know, one can apply a
selection
(or 'cut') to a dataframe by doing

df = df[df.area > 10]


if you wanted to (say) select all rows whose column value of
area
was greater than
10
. But suppose you have many dataframes, and you'd like to eventually apply this cut to all of them. It would be nice to do something like

cut = dataframe.area > 10


and then somehow be able to do

df = df[cut]


Obviously given the strategy above it won't work because
cut
refers to a specific dataframe. But is there a way to approximate this behavior?

That is, is it possible to define a
cut
that refers to no dataframe in particular and can be applied as
df = df[cut]
?

Answer

I can get something similar

cut = lambda df: df[df.area > 10]
cut(df)

Per @root

cut = 'area > 10'
df.query(cut)

Per @ayhan

cut = lambda x: x.area > 10
df[cut]

Timing

100 rows

df = pd.DataFrame(np.random.randint(0, 20, 100), columns=['area'])

enter image description here

1,000,000 rows

df = pd.DataFrame(np.random.randint(0, 20, 1000000), columns=['area'])

enter image description here