user1895076 user1895076 - 1 month ago 9
SQL Question

Python equivalent of SQL: SELECT w/ MAX() and GROUP BY

I have data like this:

df = pd.DataFrame( {
'ID': [1,1,2,3,3,3,4],
'SOME_NUM': [8,10,2,4,0,5,1]
} );

df
ID SOME_NUM
0 1 8
1 1 10
2 2 2
3 3 4
4 3 0
5 3 5
6 4 1


And I want to group by the ID column while retaining the maximum value of SOME_NUM as a separate column. This would be easy in SQL:

SELECT ID,
MAX(SOME_NUM)
FROM DF
GROUP BY ID;


But I'm having trouble finding the equivalent Python code. Seems like this should be easy. Anyone have a solution?

Desired result:

new_df
ID SOME_NUM
0 1 10
1 2 2
2 3 5
6 4 1

Answer

Seeing as how you are using Pandas... use the groupby functionality baked in

df.groupby("ID").max()