daiyue daiyue - 8 months ago 70
Python Question

pandas how to aggregate sum on a column depending on values in other columns

I am trying to sum values in a column by

groupby
on values in a second column, but meanwhile also considering values on a 3rd column, the
df
is like,

id memo amount
1 pos 1.0
1 pos 2.0
1 neg 3.0
2 pos 4.0
2 pos 5.0
2 neg 6.0
2 neg 7.0


I want to group by
id
and sum
amount
, but each group, if
memo
is
pos
it is positive and
neg
for negative, e.g. when
groupby
1
, the total amount is 0, since
-1.0 - 2.0 + 3.0 = 0
.

If I do
df.groupby('id')['amount'].sum()
, it only considers
id
and
amount
column, I am wondering how to also take
memo
into account here.

so the result will look like,

id memo amount total_amount
1 pos 1.0 0.0
1 pos 2.0 0.0
1 neg 3.0 0.0
2 pos 4.0 -4.0
2 pos 5.0 -4.0
2 neg 6.0 -4.0
2 neg 7.0 -4.0

Answer Source

Splitting the operation in two steps, you can achieve what you want through

df['temp'] = np.where(df.memo == 'pos', df.amount, -df.amount)
df['total_amount'] = df.groupby('id').temp.transform(sum)
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download