getglad getglad - 5 months ago 52
Python Question

Get index of the minimum of multi-index Pandas DataFrame using level

I have a Pandas DataFrame that is multiindexed and want to find the minimum value of a certain column in a subset of rows on each level, and get the entire contents of those rows.

import pandas as pd

idx = pd.MultiIndex.from_product([['v1', 'v2'],
['record' + str(i) for i in range(1, 7)]])

df = pd.DataFrame([[2., 114], [2., 1140],
[3., 114], [3., 1140],
[5., 114], [5., 1140],
[2., 114], [2., 1140],
[3., 114], [3., 1140],
[5., 114], [5., 1140]],
columns=['col1', 'col2'],
index=idx)


My structure:

col1 col2
level1 level2
v1 record1 2.0 114
record2 2.0 1140
record3 3.0 114
record4 3.0 1140
record5 5.0 114
record6 5.0 1140
v2 record1 2.0 114
record2 2.0 1140
record3 3.0 114
record4 3.0 1140
record5 5.0 114
record6 5.0 1140





Example desired output I want the minimum value of another column where
col1 == 5
:


col1 col2
level1 level2
v1 record5 5.0 114
v2 record5 5.0 114





I know that I can get a subset of rows by using a comparison statement.

df.ix[df['col1'] == 5]


And I also know that I can get the minimum values of a column within that subset from all levels.

df['col2'][df['col1'] == 5].min(level='level1')


And if I want to specify the level, then I can get the index of 1 row on specific level.

df.ix['v1', pay_up_file.ix['v1']['col2'][(df.ix['v1']['col1'] == 5)].idxmin()]


But I cannot figure out if there is an efficient way to get the indexes from all levels

There does not seem to be a method available along the lines of this:

df['col2'][df['col1'] == 5].idxmin(level='level1')


I can get to what I want with this:

df.ix[
(df['col1'] == 5) &
(df['col2'].isin(df['col2'][df['col1'] == 5].min(level='level1').values))
]


But with everything else that is in
Pandas
, is there a better way to get to my output?

Answer

This should work:

df.loc[df.loc[df.col1 == 5.].groupby(level=0).col2.idxmin()]

            col1  col2
v1 record5   5.0   114
v2 record5   5.0   114

Note

I'm using idxmin as you thought you ought to. But the context matters. I'm using it following a groupby(level=0).col2.idxmin() which acts as you thought col2.idxmin(level=...) should.