Miyashita Hikaru Miyashita Hikaru - 3 months ago 9
Python Question

groupby, sum and count to one table

I have a dataframe below

df=pd.DataFrame({"A":np.random.randint(1,10,9),"B":np.random.randint(1,10,9),"C":list('abbcacded')})

A B C
0 9 6 a
1 2 2 b
2 1 9 b
3 8 2 c
4 7 6 a
5 3 5 c
6 1 3 d
7 9 9 e
8 3 4 d


I would like to get grouping result (with key="C" column) below,and the row c d and e is dropped intentionally.

number A_sum B_sum
a 2 16 15
b 2 3 11


this is 2row*3column dataframe. the grouping key is column C. And
The column "number"represents the count of each letter(a and b).
A_sum and B_sum represents grouping sum of letters in column C.

I guess we should use method groupby but how can I get this data summary table ?

Answer

One option is to count the size and sum the columns for each group separately and then join them by index:

df.groupby("C")['A'].agg({"number": 'size'}).join(df.groupby('C').sum())

    number  A   B
# C         
# a     2   11  8
# b     2   14  12
# c     2   8   5
# d     2   11  12
# e     1   7   2

You can also do df.groupby('C').agg(["sum", "size"]) which gives an extra duplicated size column, but if you are fine with that, it should also work.

Comments