Miyashita Hikaru Miyashita Hikaru - 2 months ago 17
Python Question

how to groupby in complicated condition in pandas

I have dataframe like this

A B C
0 1 7 a
1 2 8 b
2 3 9 c
3 4 10 a
4 5 11 b
5 6 12 c


I would like to get groupby result (key=column C) below;

A B
d 12 36


"d" means a or b ,

so I would like to groupby only with "a" and "b".

and then put together as "d".

when I sum up with all the key elements then drop, it consume much time....

Answer

One option is to use pandas where to transform the C column so that where it was a or b becomes d and then you can groupby the transformed column and do the normal summary on it, and if rows with c is not desired, you can simply drop it after the summary:

df_sum = df.groupby(df.C.where(~df.C.isin(['a', 'b']), "d")).sum().reset_index()

df_sum
#   C   A   B
#0  c   9   21
#1  d   12  36

df_sum.loc[df_sum.C == "d"]

#   C   A   B
#1  d   12  36

To see more clearly how the where clause works:

df.C.where(~df.C.isin(['a','b']), 'd')

# 0    d
# 1    d
# 2    c
# 3    d
# 4    d
# 5    c
# Name: C, dtype: object

It acts like a replace method and replace a and b with d which will be grouped together when passed to groupby function.

Comments