user791411 - 1 year ago 100
Python Question

# pandas | Cumulative sum of unpivot data

In this "unpivoted" dataset, there are Orders composed of several Lots. Each Lot has a given Point value, as below:

``````CustID     Date         OrderNum   LotNum   PtsPerLot
A123       1/1/2015     1234       A        2
A123       1/1/2015     1234       B        10
A123       1/1/2015     5678       A        7
``````

My objective is to create a
`CUMULATIVE_POINTS_PER_YEAR`
column representing the cumulative sum of
`POINTS_PER_ORDER`
, which is itself a sum of
`PtsPerLot`
, at each
`Lot`
level. So, for a given lot,
`CumPtsPerYear`
would show the cumulative total of all
`POINTS_PER_ORDER`
for an account in a given year.

``````CustID     Date         OrderNum   LotNum   PtsPerLot    *PtsPerOrder*    *CumPtsPerYear*
A123       1/1/2015     1234       A        2            12              12
A123       1/1/2015     1234       B        10           12              12
A123       1/1/2015     5678       A        7            7               19
``````

Any ideas? I've tried
`groupby.cumsum`
on
`PtsPerLot`
and another
`groupby.cumsum`
on
`PtsPerOrder`
, but it isn't producing what I need.

First, calculate `PtsPerOrder`. Use `transform` to broadcast along the actual index of your dataframe the result of the calculation in each group:

``````df['PtsPerOrder'] = df.groupby('OrderNum')['PtsPerLot'].transform(sum)
``````

Then take the first element of that new column in each group:

``````df['CumPtsPerYear'] = df.groupby('OrderNum')['PtsPerOrder'].head(1)

df
Out[27]:
CustID      Date  OrderNum LotNum  PtsPerLot  PtsPerOrder  CumPtsPerYear
0   A123  1/1/2015      1234      A          2           12           12.0
1   A123  1/1/2015      1234      B         10           12            NaN
2   A123  1/1/2015      5678      A          7            7            7.0
``````

End the calculation by doing the cumulative sum you are searching for. It will skip the NA values. You complete your dataframe with a forward fill:

``````df['CumPtsPerYear'].cumsum().ffill()

0    12.0
1    12.0
2    19.0
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download