matthew matthew - 1 month ago 8
Python Question

Record the largest series for each id in python

I want to to keep one record that has the largest series for each id. So for each id I need one row. I think I need something like

df_new = df.groupby('id')['series'].nlargest(1)


, but that's definitely wrong.

That's how my dataset looks:

id series s1 s2 s3
1 2 4 9 1
1 8 6 2 2
1 3 9 1 3
2 9 4 1 5
2 2 2 5 5
2 5 1 7 8
3 6 7 2 3
3 2 4 4 1
3 1 3 9 9


This should be the result:

id series s1 s2 s3
1 8 6 2 2
2 9 4 1 5
3 6 7 2 3

Answer

Another solution with sort_values and aggregate first:

df = df.sort_values(by="series", ascending=False).groupby("id", as_index=False).first()
print (df)
   id  series  s1  s2  s3
0   1       8   6   2   2
1   2       9   4   1   5
2   3       6   7   2   3
Comments