Alcott Alcott - 5 months ago 18
Python Question

Python pandas groupby aggregation

I have a

DataFrame df
, composed of
(age, height)
. I want to see how the mean of height changes with age, so I group
df
by
age
and try to form a new
DataFrame new_df
, composed of
(age, mean_height)
, code goes below:

groups = df.groupby('age')
new_df = groups.agg({'height' : np.mean,
'age' : # HOW to add age?})


but I don't know how to append
age
to
new_df
, hope anyone could give me some advice.

Answer

Age is the index of the aggregated dataframe:

In [95]: df = DataFrame({'age':[10,10,20,20,20], 'height':[140,150,145, 190,200]})

In [96]: df
Out[96]: 
   age  height
0   10     140
1   10     150
2   20     145
3   20     190
4   20     200

In [97]: groups = df.groupby('age')

In [98]: groups.agg({'height':np.mean})
Out[98]: 
         height
age            
10   145.000000
20   178.333333

And df.groupby('age').mean() would achieve the same result. If you want it as a column and not an index, add a call to reset_index().

As an alternative, you can call the groupby with as_index=False:

groups = df.groupby('age', as_index=False)
groups.agg({'heigt': np.mean})