lee huang lee huang - 6 months ago 16
Python Question

how to remove duplicate values in a dataset : python

I want to remove duplicate items in a dataset by keeping the ones with highest value. Now I am using pandas :

c_maxes = hospProfiling.groupby(['Hospital_ID', 'District_ID'], group_keys=False)\
.apply(lambda x: x.ix[x['Hospital_employees'].idxmax()])
print c_maxes


Doing this is leading to the initial dataset :
to become

The columns used to group are being duplicated . Whats the error here ?


Why not using groupby max method?

hopsProfiling.groupby(['Hospital_ID','District_ID'],as_index = False).max()

And if you happen to have more than three columns, replace max by agg:

hopsProfiling.groupby(['Hospital_ID','District_ID'],as_index = False).agg({'Hospital employees': max})