pythonnewb - 2 years ago 90
Python Question

# More efficient way to find top values in pandas dataframe column

I have a df with two columns x and y . Column y is cum count of x values. x values have different counts. How do I get a result df of top two y counts for each x without iterating through rows.

Example df:

``````df = pd.DataFrame({"x": [101, 101, 101, 101, 201, 201, 201, 405, 405], "y": [1, 2, 3, 4, 1, 2, 3, 1, 2]})

x  y
0  101  1
1  101  2
2  101  3
3  101  4
4  201  1
5  201  2
6  201  3
7  405  1
8  405  2
``````

Desired result:

``````x      y
101    3
101    4
201    2
201    3
405    1
405    2
``````

You can do it this way:

``````In [35]:
df.loc[df.groupby(['x'])['y'].apply(lambda x: x.iloc[-2:]).index.get_level_values(1)]

Out[35]:
x  y
2  101  3
3  101  4
5  201  2
6  201  3
7  405  1
8  405  2
``````

So this `groupby` on 'x' column and returns the last 2 values, assuming that the df is already sorted as you've shown. This produces a df with a multindex and the second level values can be used to index back into the original df by using `get_level_values`

EDIT

To answer your comment you can `groupby` again and use `transform` with `rank` to reset the values to `1` and `2`:

``````In [51]:
df1 = df.loc[df.groupby(['x'])['y'].apply(lambda x: x.iloc[-2:]).index.get_level_values(1)]
df1['y'] = df1.groupby('x')['y'].transform(lambda x: x.rank(method='first'))
df1
​
Out[51]:
x  y
2  101  1
3  101  2
5  201  1
6  201  2
7  405  1
8  405  2
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download