Juan Carlos Juan Carlos - 30 days ago 14
Python Question

Pandas DF MultiIndex: how to select a column knowing only the second index?

I'm working with the following DataFrame:

age height weight shoe_size
0 8.0 6.0 2.0 1.0
1 8.0 NaN 2.0 1.0
2 6.0 1.0 4.0 NaN
3 5.0 1.0 NaN 0.0
4 5.0 NaN 1.0 NaN
5 3.0 0.0 1.0 0.0


I added another header to the df in this way:

zipped = list(zip(df.columns, ["RHS", "height", "weight", "shoe_size"]))

df.columns = pd.MultiIndex.from_tuples(zipped)


So this is the new DataFrame:

age height weight shoe_size
RHS height weight shoe_size
0 8.0 6.0 2.0 1.0
1 8.0 NaN 2.0 1.0
2 6.0 1.0 4.0 NaN
3 5.0 1.0 NaN 0.0
4 5.0 NaN 1.0 NaN
5 3.0 0.0 1.0 0.0


Now I know how to select the first column, by using the corresponding tuple
("age", "RHS")
:

df[("age", "RHS")]


but I was wondering about how to do this by using only the second index "RHS".
Ideally something like:

df[(any, "RHS")]

Answer Source

You pass slice(None) as the first argument to .loc, provided you sort your columns first using df.sort_index:

In [325]: df.sort_index(1).loc[:, (slice(None), 'RHS')]
Out[325]: 
   age
   RHS
0  8.0
1  8.0
2  6.0
3  5.0
4  5.0
5  3.0

You can also use pd.IndexSlice with df.loc:

In [332]: idx = pd.IndexSlice

In [333]: df.sort_index(1).loc[:, idx[:, 'RHS']]
Out[333]: 
   age
   RHS
0  8.0
1  8.0
2  6.0
3  5.0
4  5.0
5  3.0

With the slicer, you don't need to explicitly pass slice(None) because IndexSlice does that for you.


If you don't sort your columns, you get:

UnsortedIndexError: 'MultiIndex Slicing requires the index to be fully lexsorted tuple len (2), lexsort depth (0)'

If you have multiple RHS columns in the second level, all those columns are returned.