user2333196 user2333196 - 4 months ago 13
Python Question

pandas filter if name appears in column more than n times

this is my dataframe

df = pd.DataFrame({'Col1':['Joe','Bob','Joe','Joe'],
'Col2':[55,25,88,80]})


I only want the names of if it appears more than once in 'Col1'

I can do it like this

grouped = df.groupby("Col1")
grouped.filter(lambda x: x["Col1"].count()>2)['Col1'].unique()


However that is ugly looking code

Is there simpler cleaner way?

Answer

Use value_counts and isin

vc = df.Col1.value_counts() > 2
vc = vc[vc]

df.loc[df.Col1.isin(vc.index)]

enter image description here