Miyashita Hikaru - 9 months ago 57

Python Question

I have a dataframe below

`df=pd.DataFrame({"A":np.random.randint(1,10,9),"B":np.random.randint(1,10,9),"C":list('abbcacded')})`

A B C

0 9 6 a

1 2 2 b

2 1 9 b

3 8 2 c

4 7 6 a

5 3 5 c

6 1 3 d

7 9 9 e

8 3 4 d

I would like to get grouping result (with key="C" column) below,and the row c d and e is dropped intentionally.

`number A_sum B_sum`

a 2 16 15

b 2 3 11

this is 2row*3column dataframe. the grouping key is column C. And

The column "number"represents the count of each letter(a and b).

A_sum and B_sum represents grouping sum of letters in column C.

I guess we should use method groupby but how can I get this data summary table ?

Answer

One option is to count the size and sum the columns for each group separately and then join them by index:

```
df.groupby("C")['A'].agg({"number": 'size'}).join(df.groupby('C').sum())
number A B
# C
# a 2 11 8
# b 2 14 12
# c 2 8 5
# d 2 11 12
# e 1 7 2
```

You can also do `df.groupby('C').agg(["sum", "size"])`

which gives an extra duplicated size column, but if you are fine with that, it should also work.

Source (Stackoverflow)