Axel - 2 months ago 5

Python Question

I have a large

`pandas`

`df`

Imagine the relevant part of the structure as:

`Date SKU received`

2017-05-29 sku1 0

2017-05-30 sku1 0

2017-05-31 sku1 0

2017-06-01 sku1 0

2017-06-02 sku1 6

2017-06-03 sku1 2

2017-05-29 sku2 4

2017-05-30 sku2 4

2017-05-31 sku2 0

2017-06-01 sku2 0

2017-06-02 sku2 0

2017-06-03 sku2 24

From here I would like to reconstruct the order process. I know, that the stock level is reviewed on mondays, based on the stock level, new orders are placed. Orders arrive at the warehouse roughly a week later, sometimes split into multiple shippings.

I thought of creating an extra column for the weekdays (

`df["Weekday"]`

`df["Order"]`

The output could look like this:

`Date SKU received Weekday Order`

2017-05-29 sku1 0 0 8

2017-05-30 sku1 0 1 0

2017-05-31 sku1 0 2 0

2017-06-01 sku1 0 3 0

2017-06-02 sku1 6 4 0

2017-06-03 sku1 2 5 0

2017-05-29 sku2 4 0 24

2017-05-30 sku2 4 1 0

2017-05-31 sku2 0 2 0

2017-06-01 sku2 0 3 0

2017-06-02 sku2 0 4 0

2017-06-03 sku2 24 5 0

Here is the code I tried:

`import pandas as pd`

# 0 is Monday, 1 is Tuesday, etc

df["Weekday"] = df["Date"].dt.dayofweek

# create new column for the orders

df["Order"] = 0

min_days = 4

max_days = min_days + 7

for i in range(len(df)):

if df.loc[i, "Weekday"] == 0:

df.loc[i, "Order"] = df.loc[(df.Date >= df.loc[i, "Date"] + pd.to_timedelta(min_days, unit="D")) &

(df.Date < df.loc[i, "Date"] + pd.to_timedelta(max_days, unit="D")) &

(df.SKU == df.loc[i, "SKU"]), "received"].sum()

It seems to do the job, but in a slow way. Maybe someone can help me find a more pythonic/pandas approach to save some computation time.

Thanks for your help.

Answer Source

Here is a possible solution that uses pandas groupby and transform.

First idea is that you can achieve the count between two days by taking the difference of rolling sums. Also, note the trick of revert the order (`[::-1]`

) twice in order to have a rolling sum picking days in the future.

```
def count_between(ts, min_days, max_days):
return ts[::-1].pipe(lambda y: y.rolling(max_days,1).sum() - y.rolling(min_days-1,1).sum())[::-1]
```

This function would give you results for each day, so you restrict results to Mondays only setting all other entries to 0 (using `[.where][1]`

).

After setting `Date`

as index, you can do the following:

```
order = df.groupby('SKU')\
.transform(lambda x: count_between(x, min_days, max_days)\
.where(lambda y: y.index.dayofweek==0, other = 0))
order.columns = ['Order']
```

This gives the expected result:

```
pd.concat([df, order], axis = 1)
Out[319]:
SKU received Order
Date
2017-05-29 sku1 0 8.0
2017-05-30 sku1 0 0.0
2017-05-31 sku1 0 0.0
2017-06-01 sku1 0 0.0
2017-06-02 sku1 6 0.0
2017-06-03 sku1 2 0.0
2017-05-29 sku2 4 24.0
2017-05-30 sku2 4 0.0
2017-05-31 sku2 0 0.0
2017-06-01 sku2 0 0.0
2017-06-02 sku2 0 0.0
2017-06-03 sku2 24 0.0
```