user791411 user791411 - 15 days ago 8
Python Question

pandas | Cumulative sum of unpivot data

Apologies if this question has already been asked, but thank you in advance for your help.

In this "unpivoted" dataset, there are Orders composed of several Lots. Each Lot has a given Point value, as below:

CustID Date OrderNum LotNum PtsPerLot
A123 1/1/2015 1234 A 2
A123 1/1/2015 1234 B 10
A123 1/1/2015 5678 A 7


My objective is to create a
CUMULATIVE_POINTS_PER_YEAR
column representing the cumulative sum of
POINTS_PER_ORDER
, which is itself a sum of
PtsPerLot
, at each
Lot
level. So, for a given lot,
CumPtsPerYear
would show the cumulative total of all
POINTS_PER_ORDER
for an account in a given year.

CustID Date OrderNum LotNum PtsPerLot *PtsPerOrder* *CumPtsPerYear*
A123 1/1/2015 1234 A 2 12 12
A123 1/1/2015 1234 B 10 12 12
A123 1/1/2015 5678 A 7 7 19


Any ideas? I've tried
groupby.cumsum
on
PtsPerLot
and another
groupby.cumsum
on
PtsPerOrder
, but it isn't producing what I need.

Answer

First, calculate PtsPerOrder. Use transform to broadcast along the actual index of your dataframe the result of the calculation in each group:

df['PtsPerOrder'] = df.groupby('OrderNum')['PtsPerLot'].transform(sum)

Then take the first element of that new column in each group:

df['CumPtsPerYear'] = df.groupby('OrderNum')['PtsPerOrder'].head(1)

df
Out[27]: 
  CustID      Date  OrderNum LotNum  PtsPerLot  PtsPerOrder  CumPtsPerYear
0   A123  1/1/2015      1234      A          2           12           12.0
1   A123  1/1/2015      1234      B         10           12            NaN
2   A123  1/1/2015      5678      A          7            7            7.0

End the calculation by doing the cumulative sum you are searching for. It will skip the NA values. You complete your dataframe with a forward fill:

df['CumPtsPerYear'].cumsum().ffill()

0    12.0
1    12.0
2    19.0