Ashkan - 1 year ago 35

Python Question

I have a csv file with four columns: date, wholesaler, product, and sales. I am looking for finding average of previous sales for each Product and Wholesaler combination at each date. It means what is the average previous sales of product 'A' at wholesaler 'B' at time 'C'. For instance we know sales of product 'A' at wholesaler 'B' at Jan, Apr, May, Aug that are 100, 200, 300, 400 respectively. Let assume we do not have any record before Jan. So the average of previous sale of product 'A' in wholesaler 'B' at Apr is equal to 100/1, and at May is equal to (200+100)/2 and at Aug is (300+200+100)/3. The following table shows my data:

`date wholesaler product sales`

12/31/2012 53929 UPE54 4

12/31/2012 13131 UPE55 1

2/23/2013 13131 UPE55 1156

4/24/2013 13131 UPE55 1

12/1/2013 83389 UPE54 9

12/17/2013 83389 UPE54 1

12/18/2013 52237 UPE54 9

12/19/2013 53929 UME24 1

12/31/2013 82204 UPE55 9

12/31/2013 11209 UME24 4

12/31/2013 52237 UPE54 1

Now I am using this code:

`df = pd.read_csv('Sample.csv',index_col='date')`

df2 = df.groupby(['wholesaler','product'])['sales'].mean()

That gives an average sales for each wholesaler-product while I am looking for average of previous sales at each date.

`wholesaler product avg sales`

11209 UME24 4.00

13131 UPE55 713.00

22423 UME24 1.00

24302 U4E16 121.00

Thank you for your help!

Answer

It was quite tricky for me but works anyhow. Expecting more elegant solution from others.

```
import pandas as pd
import datetime
dateparse = lambda x: pd.datetime.strptime(x, '%m/%d/%Y')
df = pd.read_csv('Sample.csv',index_col='date', parse_dates=[0], date_parser=dateparse)
expd_gb = df.reset_index().groupby(['wholesaler', 'product'])['sales'].apply(pd.Series.expanding)
idx = df.reset_index().groupby(['wholesaler', 'product', 'date'])['sales'].count().index
cnct = pd.concat([expd_gb.iloc[n].mean().shift(1) for n in range(len(expd_gb))])
cnct.index = idx
cnct.to_csv('TotalAvg.csv')
```

Result,

```
wholesaler product date
11209 UME24 2013-12-31 NaN
13131 UPE55 2012-12-31 NaN
2013-02-23 1.0
2013-04-24 578.5
52237 UPE54 2013-12-18 NaN
2013-12-31 9.0
53929 UME24 2013-12-19 NaN
UPE54 2012-12-31 NaN
82204 UPE55 2012-12-31 NaN
83389 UPE54 2013-12-01 NaN
2013-12-17 9.0
```

Source (Stackoverflow)