Miyashita Hikaru - 5 months ago 31

Python Question

I have dataframe like this

`A B C`

0 1 7 a

1 2 8 b

2 3 9 c

3 4 10 a

4 5 11 b

5 6 12 c

I would like to get groupby result (key=column C) below;

`A B`

d 12 36

"d" means a or b ,

so I would like to groupby only with "a" and "b".

and then put together as "d".

when I sum up with all the key elements then drop, it consume much time....

Answer

One option is to use `pandas`

`where`

to transform the C column so that where it was `a`

or `b`

becomes `d`

and then you can groupby the transformed column and do the normal summary on it, and if rows with `c`

is not desired, you can simply drop it after the summary:

```
df_sum = df.groupby(df.C.where(~df.C.isin(['a', 'b']), "d")).sum().reset_index()
df_sum
# C A B
#0 c 9 21
#1 d 12 36
df_sum.loc[df_sum.C == "d"]
# C A B
#1 d 12 36
```

To see more clearly how the `where`

clause works:

```
df.C.where(~df.C.isin(['a','b']), 'd')
# 0 d
# 1 d
# 2 c
# 3 d
# 4 d
# 5 c
# Name: C, dtype: object
```

It acts like a replace method and replace `a`

and `b`

with `d`

which will be grouped together when passed to `groupby`

function.