eleanora eleanora - 2 months ago 14
Python Question

How to divide the sum with the size in a pandas groupby

I have a dataframe like

ID_0 ID_1 ID_2
0 a b 1
1 a c 1
2 a b 0
3 d c 0
4 a c 0
5 a c 1


I would like to groupby ['ID_0','ID_1'] and produce a new dataframe which has the sum of the ID_2 values for each group divided by the number of rows in each group.

grouped = df.groupby(['ID_0', 'ID_1'])
print grouped.agg({'ID_2': np.sum}), "\n", grouped.size()


gives

ID_2
ID_0 ID_1
a b 1
c 2
d c 0
ID_0 ID_1
a b 2
c 3
d c 1
dtype: int64


How can I get the new dataframe with the np.sum values divided by the size() values?

Answer

Use groupby.apply instead:

df.groupby(['ID_0', 'ID_1']).apply(lambda x: x['ID_2'].sum()/len(x))

ID_0  ID_1
a     b       0.500000
      c       0.666667
d     c       0.000000
dtype: float64
Comments