eleanora - 9 months ago 67

Python Question

I have a dataframe like

`ID_0 ID_1 ID_2`

0 a b 1

1 a c 1

2 a b 0

3 d c 0

4 a c 0

5 a c 1

I would like to groupby ['ID_0','ID_1'] and produce a new dataframe which has the sum of the ID_2 values for each group divided by the number of rows in each group.

`grouped = df.groupby(['ID_0', 'ID_1'])`

print grouped.agg({'ID_2': np.sum}), "\n", grouped.size()

gives

`ID_2`

ID_0 ID_1

a b 1

c 2

d c 0

ID_0 ID_1

a b 2

c 3

d c 1

dtype: int64

How can I get the new dataframe with the np.sum values divided by the size() values?

Answer

Use `groupby.apply`

instead:

```
df.groupby(['ID_0', 'ID_1']).apply(lambda x: x['ID_2'].sum()/len(x))
ID_0 ID_1
a b 0.500000
c 0.666667
d c 0.000000
dtype: float64
```

Source (Stackoverflow)