Python Question

Get both the top-n values and the names of columns they occur in, within each row in dataframe

I have a dataframe like in this one:

df = pd.DataFrame({'a':[1,2,1],'b':[4,6,0],'c':[0,4,8]})
+---+---+---+
| a | b | c |
+---+---+---+
| 1 | 4 | 0 |
+---+---+---+
| 2 | 6 | 4 |
+---+---+---+
| 1 | 0 | 8 |
+---+---+---+


for each row, I need (both) the 'n' (in this case two) highest values and the corresponding column in descending order:

row 1: 'b':4,'a':1
row 2: 'b':6,'c':4
row 3: 'c':8,'a':1

Answer

Here are two ways, both adapt from @unutbu's answer from Find names of top-n highest-value columns in each pandas dataframe row

  1. Use Python Decorate-Sort-Undecorate on each row to insert the column names, do np.argsort, keep the top-n, reformat the answer. (I think this is cleaner)

  2. Get the matrix of topnlocs as follows, then use it both to reindex into df.columns, and df.values, and combine that output.

;

import numpy as np

nlargest = 2
topnlocs = np.argsort(-df.values, axis=1)[:, 0:nlargest]
# ... now you can use topnlocs to reindex into df.columns, and df.values