user1367204 user1367204 - 1 year ago 31
Python Question

How to do a complex Groupyby in pandas?

I have a pandas dataframe like:

a b id
1 10 6 1
2 6 -3 1
3 -3 12 1 # id is 1, but needs a tie breaker
4 -2 12 1 # id is 1, but needs a tie breaker
5 4 8 2
6 12 11 2
7 3 -5 2
8 3 -5 2


How would make a new dataframe which first takes the
id
column and then gets everytime that the column
b
is over 10, and if there are multiple rows that meet this criterion, then do a tie breaker by selecting the row with the minimum value in column
a
:

a b id
1 -3 12 1
2 12 11 2


I have a dataframe with like 2,000,000 rows and about 10,000
id
values, so a for-loop is very slow.

Answer Source

Here is one solution, if df is your original DataFrame:

df2 = df[df['b'] > 10]
out = df2.loc[df2.groupby('id')['a'].idxmin()]

and you should get:

    a   b  id
3  -3  12   1
6  12  11   2
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download