Mukul - 6 months ago 29

Python Question

Working on a problem, I have the following dataframe in python

`week hour week_hr store_code baskets`

0 201616 106 201616106 505 0

1 201616 107 201616107 505 0

2 201616 108 201616108 505 0

3 201616 109 201616109 505 18

4 201616 110 201616110 505 0

5 201616 106 201616108 910 0

6 201616 107 201616106 910 0

7 201616 108 201616107 910 2

8 201616 109 201616108 910 3

9 201616 110 201616109 910 10

Here "hour" variable is a concat of "weekday" and "hour of shop", example weekday is monday=1 and hour of shop is 6am then hour variable = 106, similarly cal_hr is a concat of week and hour. I want to get those rows where i see a trend of no baskets , i.e 0 baskets for

`week hour week_hr store_code baskets`

0 201616 106 201616106 505 0

1 201616 107 201616107 505 0

2 201616 108 201616108 505 0

Can i do this in python using pandas and loops? The dataset requires sorting by store and hour. Completely new to python (

Answer

Do the following:

- Sort by store_code, week_hr
- Filter by 0
- Store the subtraction between df['week_hr'][1:].values-df['week_hr'][:-1].values so you will get to know if they are continuos.
Now you can give groups to continuous and filter as you want.

`import numpy as np import pandas as pd # 1 t1 = df.sort_values(['store_code', 'week_hr']) # 2 t2 = t1[t1['baskets'] == 0] # 3 continuous = t2['week_hr'][1:].values-t2['week_hr'][:-1].values == 1 groups = np.cumsum(np.hstack([False, continuous==False])) t2['groups'] = groups # 4 t3 = t2.groupby(['store_code', 'groups'], as_index=False)['week_hr'].count() t4 = t3[t3.week_hr > 2] print pd.merge(t2, t4[['store_code', 'groups']])`

There's no need for looping!