Dance Party2 - 1 year ago 135
Python Question

# Pandas Sort Multiindex by Group Sum

Given the following data frame:

``````import pandas as pd
df=pd.DataFrame({'County':['A','B','C','D','A','B','C','D','A','B','C','D','A','B','C','D','A','B'],
'Hospital':['a','b','c','d','e','a','b','c','e','a','b','c','d','e','a','b','c','e'],
'Enrollment':[44,55,42,57,95,54,27,55,81,54,65,23,89,76,34,12,1,67],
'Year':['2012','2012','2012','2012','2012','2012','2012','2012','2012','2013',
'2013','2013','2013','2013','2013','2013','2013','2013']})
d2=pd.pivot_table(df,index=['County','Hospital'],columns=['Year'])#.sort_columns

d2
Enrollment
Year   2012     2013
County  Hospital
A       a      44.0     NaN
c      NaN      1.0
d      NaN      89.0
e      88.0     NaN
B       a      54.0     54.0
b      55.0     NaN
e      NaN      71.5
C       a      NaN      34.0
b      27.0     65.0
c      42.0     NaN
D       b      NaN      12.0
c      55.0     23.0
d      57.0     NaN
``````

I need to sort the data frame such that County is sorted descendingly by the sum of Enrollment for the most recent year (I want to avoid using '2013' directly) like this:

``````        Enrollment
Year          2012  2013
County  Hospital
B       a         54    54
b         55    NaN
e         NaN   71.5
C       a         NaN   34
b         27    65
c         42    NaN
A       a         44    NaN
c         NaN   1
d         NaN   89
e         88    NaN
D       b         NaN   12
c         55    23
d         57    NaN
``````

Then, I'd like each hospital sorted within each county, descendingly, but 2013 enrollments like this:

``````        Enrollment
Year    2012    2013
County  Hospital
B       e       NaN 71.5
a       54  54
b       55  NaN
C       b       27  65
a       NaN 34
c       42  NaN
A       d       NaN 89
c       NaN 1
a       44  NaN
e       88  NaN
D       c       55  23
b       NaN 12
d       57  NaN
``````

So far, I've tried using groupby to get the sums and merge the back but have not had any luck:

``````d2.groupby('County').sum()
``````

You could:

``````max_col = max(d2.columns.get_level_values(1)) # get column 2013
d2['sum'] = d2.groupby(level='County').transform('sum').loc[:, ('Enrollment', max_col)]
d2 = d2.sort_values(['sum', ('Enrollment', max_col)], ascending=[False, False])
``````

to get:

``````                Enrollment          sum
Year                  2012  2013
County Hospital
B      e               NaN  71.5  125.5
a              54.0  54.0  125.5
b              55.0   NaN  125.5
C      b              27.0  65.0   99.0
a               NaN  34.0   99.0
c              42.0   NaN   99.0
A      d               NaN  89.0   90.0
c               NaN   1.0   90.0
a              44.0   NaN   90.0
e              88.0   NaN   90.0
D      c              55.0  23.0   35.0
b               NaN  12.0   35.0
d              57.0   NaN   35.0
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download