Mukul - 1 year ago 71

Python Question

I am having **weekly( say 5 weeks)** sales and baskets for **product and store combination**, i want to find the total spend and visits of product **(irrespective of store)** for a particular **week(say 201520)**" i.e. 20th week of year 2015. the moment i select a week , there can be some product that are not sold in that week. but i **do not want to remove them from my group by**. Essentially i want all the products sold in 5 weeks , but if a product is not sold in the week i selected above , i want it to be present in my final dataFrame with aggregated numbers to be 0. Sample data.(lets **assume prod 122 is not sold in week 201520**)

`prod store week baskets sales`

123 112 201518 20 100.45

123 112 201519 21 89.65

123 112 201520 22 1890.54

122 112 201518 10 909.99

sample output (for 201520)

`prod total_baskets total_sales spend_per_basket`

123 22 1890.54 85.93363636

122 0 0 0

i know this can be done using

i am first selecting the week for which i am doing groupby.

then creating a list of all product present in my inital weekly dataset.

then remerging back with the group by data. I find this non - efficient. please help. Also need to create

dummy code:

`trans_for_my_week=weekly_trans[weekly_trans['week']==201520]`

avg_sales=pd.DataFrame(trans_for_my_week.groupby(['prod']).agg({'baskets': {'total_baskets':'sum'},

'sales' :{'total_sales':'sum'}}))

avg_sales_period_0.columns=avg_sales_period_0.columns.droplevel(0)

avg_sales_period_0=avg_sales_period_0.reset_index()

and so on

Answer Source

**UPDATE2:** adding new calculated multi-level column:

```
In [229]: x = res.sales / res.baskets
In [230]: x
Out[230]:
week 201518 201519 201520
prod
122 90.9990 NaN NaN
123 5.0225 4.269048 85.933636
In [231]: x.columns = pd.MultiIndex.from_product(['spend_per_basket', res.columns.get_level_values(1).drop_duplicates()])
In [232]: x
Out[232]:
spend_per_basket
201518 201519 201520
prod
122 90.9990 NaN NaN
123 5.0225 4.269048 85.933636
In [234]: res = res.join(x)
In [235]: res
Out[235]:
baskets sales spend_per_basket
week 201518 201519 201520 201518 201519 201520 201518 201519 201520
prod
122 10 0 0 909.99 0.00 0.00 90.9990 NaN NaN
123 20 21 22 100.45 89.65 1890.54 5.0225 4.269048 85.933636
```

PS here is very well documented multiindexing (multi-level) pandas techniques with lots of examples

**UPDATE:** inspired by @JoeR's solution - here is modified `pivot_table()`

version:

```
res = df.pivot_table(index='prod', columns='week', values=['baskets','sales'], aggfunc='sum', fill_value=0)
In [189]: res
Out[189]:
baskets sales
week 201518 201519 201520 201518 201519 201520
prod
122 10 0 0 909.99 0.00 0.00
123 20 21 22 100.45 89.65 1890.54
In [190]: res[[('baskets',201519)]]
Out[190]:
baskets
week 201519
prod
122 0
123 21
In [192]: res.ix[122, [('sales',201519)]]
Out[192]:
week
sales 201519 0.0
Name: 122, dtype: float64
```

you can also flatten your column levels as follows:

```
In [194]: res2 = res.copy()
In [196]: res2.columns = ['{0[0]}_{0[1]}'.format(col) for col in res2.columns]
In [197]: res2
Out[197]:
baskets_201518 baskets_201519 baskets_201520 sales_201518 sales_201519 sales_201520
prod
122 10 0 0 909.99 0.00 0.00
123 20 21 22 100.45 89.65 1890.54
```

but i would keep it as multilevel column so you could use advanced indexing (like in the example above)

**OLD answer:**

I'd calculate it once for all your data:

```
from itertools import product
In [165]: %paste
g = df.groupby(['week', 'prod']).agg({'baskets':'sum', 'sales':'sum'}).reset_index()
al = pd.DataFrame(list(product(df['prod'].unique(), df.week.unique())), columns=['prod','week'])
res = pd.merge(al, g, on=['prod','week'], how='left').fillna(0)
## -- End pasted text --
In [166]: res
Out[166]:
prod week sales baskets
0 123 201518 100.45 20.0
1 123 201519 89.65 21.0
2 123 201520 1890.54 22.0
3 122 201518 909.99 10.0
4 122 201519 0.00 0.0
5 122 201520 0.00 0.0
```