Amrith Krishna Amrith Krishna - 4 months ago 83
Python Question

Filter data with groupby in pandas

I have a DataFrame where I have the following data. Here each row represents a word appearing in each episode of a TV series. Now if a word appears 3 times in an episode, the padas dataframe ahs 3 rows. Now I need to filter a list of words such that I should only get only words which appear more than or equal tp 2 times. I can do this by

groupby
, but if a word appears 2 (or say 3,4 or 5)times, I need two (3, 4 or 5) rows for it.

By groupby, I will only get the unique entry and count, but I need the entry to rpereat as many times as it appears in the dialogue. Is there a one-liner to do this?

dialogue episode
0 music 1
1 corrections 1
2 somnath 1
3 yadav 5
4 join 2
5 instagram 1
6 wind 2
7 music 1
8 whimpering 2
9 music 1
10 wind 3


SO here I should ideally get,

dialogue episode
0 music 1
6 wind 2
7 music 1
9 music 1
10 wind 3


As, these are the only 2 words that appears more than or equal to 2 times.

Answer

You can use groupby's filter:

In [11]: df.groupby("dialogue").filter(lambda x: len(x) > 1)
Out[11]:
   dialogue  episode
0     music        1
6      wind        2
7     music        1
9     music        1
10     wind        3
Comments