Mari Mari - 10 months ago 128
Python Question

Largest (n) numbers with Index and Column name in Pandas DataFrame

I wish to find out the largest 5 numbers in a DataFrame and store the Index name and Column name for these 5 values.

I am trying to use nlargest() and idxmax methods but failing to achieve what i want. My code is as below:

import numpy as np
import pandas as pd
from pandas import Series, DataFrame
df = DataFrame({'a': [1, 10, 8, 11, -1],'b': [1.0, 2.0, 6, 3.0, 4.0],'c': [1.0, 2.0, 6, 3.0, 4.0]})

Can you kindly let me know How can i achieve this. Thank you

Answer Source

Use stack and nlargest:

max_vals = df.stack().nlargest(5)

This will give you a Series with a multiindex, where the first level is the original DataFrame's index, and the second level is the column name for the given value. Here's what max_vals looks like:

3  a    11.0
1  a    10.0
2  a     8.0
   b     6.0
   c     6.0

To explicitly get the index and column names, use get_level_values on the index of max_vals:

max_idx = max_vals.index.get_level_values(0)
max_cols = max_vals.index.get_level_values(1)

The result of max_idx:

Int64Index([3, 1, 2, 2, 2], dtype='int64')

The result of max_cols:

Index(['a', 'a', 'a', 'b', 'c'], dtype='object')