sanaz sanaz - 3 months ago 8
Python Question

top n columns with highest value in each pandas dataframe row

I have the following dataframe:

id p1 p2 p3 p4
1 0 9 1 4
2 0 2 3 4
3 1 3 10 7
4 1 5 3 1
5 2 3 7 10


I need to reshape the data frame in a way that for each id it will have the top 3 columns with the highest values. The result would be like this:

id top1 top2 top3
1 p2 p4 p3
2 p4 p3 p2
3 p3 p4 p2
4 p2 p3 p4/p1
5 p4 p3 p2


It shows the top 3 best sellers for every
user_id
. I have already done it using the
dplyr
package in R, but I am looking for the pandas equivalent.

Answer

You can use:

df = df.set_index('id').apply(lambda x: pd.Series(x.sort_values(ascending=False)
       .iloc[:3].index, 
      index=['top1','top2','top3']), axis=1).reset_index()
print (df)
   id top1 top2 top3
0   1   p2   p4   p3
1   2   p4   p3   p2
2   3   p3   p4   p2
3   4   p2   p3   p4
4   5   p4   p3   p2
Comments