Ross R Ross R - 1 month ago 29
Python Question

Python Pandas - How to flatten a hierarchical index in columns

I have a data frame with a hierarchical index in axis 1 (columns) (from a groupby.agg operation):

USAF WBAN year month day s_PC s_CL s_CD s_CNT tempf
sum sum sum sum amax amin
0 702730 26451 1993 1 1 1 0 12 13 30.92 24.98
1 702730 26451 1993 1 2 0 0 13 13 32.00 24.98
2 702730 26451 1993 1 3 1 10 2 13 23.00 6.98
3 702730 26451 1993 1 4 1 0 12 13 10.04 3.92
4 702730 26451 1993 1 5 3 0 10 13 19.94 10.94


I want to flatten it, so that it looks like this (names aren't critical - I could rename):

USAF WBAN year month day s_PC s_CL s_CD s_CNT tempf_amax tmpf_amin
0 702730 26451 1993 1 1 1 0 12 13 30.92 24.98
1 702730 26451 1993 1 2 0 0 13 13 32.00 24.98
2 702730 26451 1993 1 3 1 10 2 13 23.00 6.98
3 702730 26451 1993 1 4 1 0 12 13 10.04 3.92
4 702730 26451 1993 1 5 3 0 10 13 19.94 10.94


How do I do this? (I've tried a lot, to no avail.)

Per a suggestion, here is the head in dict form

{('USAF', ''): {0: '702730',
1: '702730',
2: '702730',
3: '702730',
4: '702730'},
('WBAN', ''): {0: '26451', 1: '26451', 2: '26451', 3: '26451', 4: '26451'},
('day', ''): {0: 1, 1: 2, 2: 3, 3: 4, 4: 5},
('month', ''): {0: 1, 1: 1, 2: 1, 3: 1, 4: 1},
('s_CD', 'sum'): {0: 12.0, 1: 13.0, 2: 2.0, 3: 12.0, 4: 10.0},
('s_CL', 'sum'): {0: 0.0, 1: 0.0, 2: 10.0, 3: 0.0, 4: 0.0},
('s_CNT', 'sum'): {0: 13.0, 1: 13.0, 2: 13.0, 3: 13.0, 4: 13.0},
('s_PC', 'sum'): {0: 1.0, 1: 0.0, 2: 1.0, 3: 1.0, 4: 3.0},
('tempf', 'amax'): {0: 30.920000000000002,
1: 32.0,
2: 23.0,
3: 10.039999999999999,
4: 19.939999999999998},
('tempf', 'amin'): {0: 24.98,
1: 24.98,
2: 6.9799999999999969,
3: 3.9199999999999982,
4: 10.940000000000001},
('year', ''): {0: 1993, 1: 1993, 2: 1993, 3: 1993, 4: 1993}}

Answer

I think the easiest way to do this would be to set the columns to the top level:

df.columns = df.columns.get_level_values(0)

Note: if the to level has a name you can also access it by this, rather than 0.

.

If you want to combine/join your MultiIndex into one Index (assuming you have just string entries in your columns) you could:

df.columns = [' '.join(col).strip() for col in df.columns.values]

Note: we must strip the whitespace for when there is no second index.

In [11]: [' '.join(col).strip() for col in df.columns.values]
Out[11]: 
['USAF',
 'WBAN',
 'day',
 'month',
 's_CD sum',
 's_CL sum',
 's_CNT sum',
 's_PC sum',
 'tempf amax',
 'tempf amin',
 'year']