Sam Gregson Sam Gregson - 3 months ago 9
Python Question

Accessing columns after using pandas groupby (for plotting and further calculation)

I am using the df.groupby() method:

g1 = df[['md', 'agd', 'hgd']].groupby(['md']).agg(['mean', 'count', 'std'])


It produces exactly what I want!

agd hgd
mean count std mean count std
md
-4 1.398350 2 0.456494 -0.418442 2 0.774611
-3 -0.281814 10 1.314223 -0.317675 10 1.161368
-2 -0.341940 38 0.882749 0.136395 38 1.240308
-1 -0.137268 125 1.162081 -0.103710 125 1.208362
0 -0.018731 603 1.108109 -0.059108 603 1.252989
1 -0.034113 178 1.128363 -0.042781 178 1.197477
2 0.118068 43 1.107974 0.383795 43 1.225388
3 0.452802 18 0.805491 -0.335087 18 1.120520
4 0.304824 1 NaN -1.052011 1 NaN


However, I now want to access the groupby object columns like a "normal" dataframe.

I will then be able to:
1) calculate the errors on the agd and hgd means
2) make scatter plots on md (x axis) vs agd mean (hgd mean) with appropriate error bars added.

Is this possible? Perhaps by playing with the indexing?

Thanks in advance!

Answer

1) You can rename the columns and proceed as normal (will get rid of the multi-indexing)

g1.columns = ['agd_mean', 'agd_std','hgd_mean','hgd_std']

2) You can keep multi-indexing and use both levels in turn (docs)

g1['agd']['mean count']