user791411 - 1 month ago 28

Python Question

Apologies if this question has already been asked, but thank you in advance for your help.

In this "unpivoted" dataset, there are **Orders** composed of several **Lots**. Each **Lot** has a given **Point** value, as below:

`CustID Date OrderNum LotNum PtsPerLot`

A123 1/1/2015 1234 A 2

A123 1/1/2015 1234 B 10

A123 1/1/2015 5678 A 7

My objective is to create a

`CUMULATIVE_POINTS_PER_YEAR`

`POINTS_PER_ORDER`

`PtsPerLot`

`Lot`

`CumPtsPerYear`

`POINTS_PER_ORDER`

`CustID Date OrderNum LotNum PtsPerLot *PtsPerOrder* *CumPtsPerYear*`

A123 1/1/2015 1234 A 2 12 12

A123 1/1/2015 1234 B 10 12 12

A123 1/1/2015 5678 A 7 7 19

Any ideas? I've tried

`groupby.cumsum`

`PtsPerLot`

`groupby.cumsum`

`PtsPerOrder`

Answer

First, calculate `PtsPerOrder`

. Use `transform`

to *broadcast* along the actual index of your dataframe the result of the calculation in each group:

```
df['PtsPerOrder'] = df.groupby('OrderNum')['PtsPerLot'].transform(sum)
```

Then take the first element of that new column in each group:

```
df['CumPtsPerYear'] = df.groupby('OrderNum')['PtsPerOrder'].head(1)
df
Out[27]:
CustID Date OrderNum LotNum PtsPerLot PtsPerOrder CumPtsPerYear
0 A123 1/1/2015 1234 A 2 12 12.0
1 A123 1/1/2015 1234 B 10 12 NaN
2 A123 1/1/2015 5678 A 7 7 7.0
```

End the calculation by doing the cumulative sum you are searching for. It will skip the NA values. You complete your dataframe with a forward fill:

```
df['CumPtsPerYear'].cumsum().ffill()
0 12.0
1 12.0
2 19.0
```