codingknob codingknob - 3 months ago 6
Python Question

drop labels created by agg(['sum','count'])

I have a df that can be reconstructed as pd.DataFrame(dict):

city d1 d2 d3 d4
date
2014-05-01 sfo 4.26 6.58 2.32 -4.87
2014-05-01 yyz 2.90 6.64 24.78 -50.55
2014-05-01 yvr 2.90 6.64 24.78 -50.55
2014-05-01 dfw 4.06 6.54 2.40 -4.06
2014-05-01 pdx 9.96 6.66 30.35 64.24


dict:

{'date': {0: pd.Timestamp('2014-05-01 00:00:00'),
1: pd.Timestamp('2014-05-01 00:00:00'),
2: pd.Timestamp('2014-05-01 00:00:00'),
3: pd.Timestamp('2014-05-01 00:00:00'),
4: pd.Timestamp('2014-05-01 00:00:00')},
'city': {0: 'sfo', 1: 'yyz', 2: 'yvr', 3: 'dfw', 4: 'pdx'},
'd1': {0: 4.2599999999999998,
1: 2.8999999999999999,
2: 2.8999999999999999,
3: 4.0599999999999996,
4: 9.9600000000000009},
'd2': {0: 6.5800000000000001,
1: 6.6399999999999997,
2: 6.6399999999999997,
3: 6.54,
4: 6.6600000000000001},
'd3': {0: 2.3199999999999998,
1: 24.780000000000001,
2: 24.780000000000001,
3: 2.3999999999999999,
4: 30.350000000000001},
'd4': {0: -4.8700000000000001,
1: -50.549999999999997,
2: -50.549999999999997,
3: -4.0599999999999996,
4: 64.239999999999995}}

df.set_index(['date'], inplace=True)


I perform the following aggregate via
TimeGrouper
:

grouped = df.groupby(['city', pd.TimeGrouper('M')])
monthly_agg = grouped.agg(['sum', 'count'])


monthly_agg looks like:

d1 d2 d3 d4
sum count sum count sum count sum count
city date
dfw 2014-05-31 4.06 1 6.54 1 2.40 1 -4.06 1
pdx 2014-05-31 9.96 1 6.66 1 30.35 1 64.24 1
sfo 2014-05-31 4.26 1 6.58 1 2.32 1 -4.87 1
yvr 2014-05-31 2.90 1 6.64 1 24.78 1 -50.55 1
yyz 2014-05-31 2.90 1 6.64 1 24.78 1 -50.55 1


The
count
label column is used for sanity checking but once that is done, I want to be able to drop it.

As well, the
sum
label under d1,d2,d3,d4 etc is no longer necessary

My desired output:

d1 d2 d3 d4
city date
dfw 2014-05-31 4.06 6.54 2.40 -4.06
pdx 2014-05-31 9.96 6.66 30.35 64.24
sfo 2014-05-31 4.26 6.58 2.32 -4.87
yvr 2014-05-31 2.90 6.64 24.78 -50.55
yyz 2014-05-31 2.90 6.64 24.78 -50.55


How do I get this?

Answer
monthly_agg = monthly_agg.loc[:, pd.IndexSlice[:,'sum']]
monthly_agg.columns = monthly_agg.columns.droplevel(1)
monthly_agg
Out: 
                   d1    d2     d3     d4
city date                                
dfw  2014-05-31  4.06  6.54   2.40  -4.06
pdx  2014-05-31  9.96  6.66  30.35  64.24
sfo  2014-05-31  4.26  6.58   2.32  -4.87
yvr  2014-05-31  2.90  6.64  24.78 -50.55
yyz  2014-05-31  2.90  6.64  24.78 -50.55

Or with xs:

monthly_agg.xs('sum', axis=1, level=1)
Out: 
                   d1    d2     d3     d4
city date                                
dfw  2014-05-31  4.06  6.54   2.40  -4.06
pdx  2014-05-31  9.96  6.66  30.35  64.24
sfo  2014-05-31  4.26  6.58   2.32  -4.87
yvr  2014-05-31  2.90  6.64  24.78 -50.55
yyz  2014-05-31  2.90  6.64  24.78 -50.55