Dance Party2 Dance Party2 - 5 months ago 33
Python Question

Pandas Sort Multiindex by Group Sum

Given the following data frame:

import pandas as pd
df=pd.DataFrame({'County':['A','B','C','D','A','B','C','D','A','B','C','D','A','B','C','D','A','B'],
'Hospital':['a','b','c','d','e','a','b','c','e','a','b','c','d','e','a','b','c','e'],
'Enrollment':[44,55,42,57,95,54,27,55,81,54,65,23,89,76,34,12,1,67],
'Year':['2012','2012','2012','2012','2012','2012','2012','2012','2012','2013',
'2013','2013','2013','2013','2013','2013','2013','2013']})
d2=pd.pivot_table(df,index=['County','Hospital'],columns=['Year'])#.sort_columns

d2
Enrollment
Year 2012 2013
County Hospital
A a 44.0 NaN
c NaN 1.0
d NaN 89.0
e 88.0 NaN
B a 54.0 54.0
b 55.0 NaN
e NaN 71.5
C a NaN 34.0
b 27.0 65.0
c 42.0 NaN
D b NaN 12.0
c 55.0 23.0
d 57.0 NaN


I need to sort the data frame such that County is sorted descendingly by the sum of Enrollment for the most recent year (I want to avoid using '2013' directly) like this:

Enrollment
Year 2012 2013
County Hospital
B a 54 54
b 55 NaN
e NaN 71.5
C a NaN 34
b 27 65
c 42 NaN
A a 44 NaN
c NaN 1
d NaN 89
e 88 NaN
D b NaN 12
c 55 23
d 57 NaN


Then, I'd like each hospital sorted within each county, descendingly, but 2013 enrollments like this:

Enrollment
Year 2012 2013
County Hospital
B e NaN 71.5
a 54 54
b 55 NaN
C b 27 65
a NaN 34
c 42 NaN
A d NaN 89
c NaN 1
a 44 NaN
e 88 NaN
D c 55 23
b NaN 12
d 57 NaN


So far, I've tried using groupby to get the sums and merge the back but have not had any luck:

d2.groupby('County').sum()


Thanks in advance!

Answer

You could:

max_col = max(d2.columns.get_level_values(1)) # get column 2013
d2['sum'] = d2.groupby(level='County').transform('sum').loc[:, ('Enrollment', max_col)]
d2 = d2.sort_values(['sum', ('Enrollment', max_col)], ascending=[False, False])

to get:

                Enrollment          sum
Year                  2012  2013       
County Hospital                        
B      e               NaN  71.5  125.5
       a              54.0  54.0  125.5
       b              55.0   NaN  125.5
C      b              27.0  65.0   99.0
       a               NaN  34.0   99.0
       c              42.0   NaN   99.0
A      d               NaN  89.0   90.0
       c               NaN   1.0   90.0
       a              44.0   NaN   90.0
       e              88.0   NaN   90.0
D      c              55.0  23.0   35.0
       b               NaN  12.0   35.0
       d              57.0   NaN   35.0