Fraz Fraz - 4 months ago 9
Python Question

finding the max per id and creating a new column in pandas

So, I have a pandas dataframe like this:

id, counts
1, 20
1, 21
1,15
1, 24
2,12
2,42
2,9
3,43
...

id, counts, label
1, 20, 0
1, 21, 0
1,15, 0
1, 24, 1 # because 24 is the highest count for id 1
2,12, 0
2,42, 1 # because 42 is the highest count for id 2
2,9, 0
3,43,
...


How do i do this in using pandas

Answer

This seems to work:

df['label'] = 0
df['label'].iloc[df.groupby('id').apply(lambda x: x['counts'].argmax()).values] = 1

But it is so ugly! And does not follow good coding practices... I'll try to improve it.


If you like the below line, upvote this answer (Merlin's answer to this question) to say thanks.

df['label'] = np.where(df.index.isin((df.groupby('id')['counts'].idxmax())), 1, 0)

IMHO, you should use Merlin's answer to solve this problem. Mine is not good coding practice and will scale poorly compared to Merlin's