user2478928 user2478928 - 4 years ago 595
Python Question

Pandas Dataframe Filtering

given a dataframe with two columns: User and Code, how can I filter out the user entries where they don't have at least x entries with a given Code?

E.g. I'd like to filter out all users when they don't have at least 5 occurances of a given type:

User Type
A Alpha
A Alpha
A Alpha
A Alpha
A Alpha
A Beta
A Beta
A Beta
B Alpha
B Alpha
B Alpha
B Alpha
B Alpha


Here I would like to filter out(remove) the 4x A with the Beta code (only 4 times here), while keeping everything else.

Thanks!

Answer Source

You can groupby on 'User' and 'Type' and filter:

In [91]:
df.groupby(['User', 'Type']).filter(lambda x: len(x) > 4)

Out[91]:
   User   Type
0     A  Alpha
1     A  Alpha
2     A  Alpha
3     A  Alpha
4     A  Alpha
8     B  Alpha
9     B  Alpha
10    B  Alpha
11    B  Alpha
12    B  Alpha
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download