Mukul Mukul - 2 months ago 18
Python Question

expanding rows in pandas dataframe

I have the following data:

product Sales_band Hour_id sales
prod_1 HIGH 1 200
prod_1 HIGH 3 100
prod_1 HIGH 4 300
prod_1 VERY HIGH 2 100
prod_1 VERY HIGH 5 253
prod_1 VERY HIGH 6 234


want to add rows based on the hour_id value. hour_id variable can take values from 1 to 10. So the same data above will be expanded where the hour ids are missing. Dummy output is :(sales = 0 when missing hour id)

product Sales_band Hour_id sales
prod_1 HIGH 1 200
prod_1 HIGH 2 0
prod_1 HIGH 3 100
prod_1 HIGH 4 300
prod_1 HIGH 5 0
prod_1 HIGH 6 0
prod_1 HIGH 7 0
prod_1 HIGH 8 0
prod_1 HIGH 9 0
prod_1 HIGH 10 0
prod_1 VERY HIGH 1 0
prod_1 VERY HIGH 2 100
prod_1 VERY HIGH 3 0
prod_1 VERY HIGH 4 0
prod_1 VERY HIGH 5 253
prod_1 VERY HIGH 6 234
prod_1 VERY HIGH 7 0
prod_1 VERY HIGH 8 0
prod_1 VERY HIGH 9 0
prod_1 VERY HIGH 10 0


how can I achieve this using python dataframe.

Answer

Use groupby with reindex:

print (df.groupby(['product','Sales_band'])['Hour_id','sales']
         .apply(lambda x: x.set_index('Hour_id').reindex(range(1, 11), fill_value=0))
         .reset_index())

   product Sales_band  Hour_id  sales
0   prod_1       HIGH        1    200
1   prod_1       HIGH        2      0
2   prod_1       HIGH        3    100
3   prod_1       HIGH        4    300
4   prod_1       HIGH        5      0
5   prod_1       HIGH        6      0
6   prod_1       HIGH        7      0
7   prod_1       HIGH        8      0
8   prod_1       HIGH        9      0
9   prod_1       HIGH       10      0
10  prod_1  VERY HIGH        1      0
11  prod_1  VERY HIGH        2    100
12  prod_1  VERY HIGH        3      0
13  prod_1  VERY HIGH        4      0
14  prod_1  VERY HIGH        5    253
15  prod_1  VERY HIGH        6    234
16  prod_1  VERY HIGH        7      0
17  prod_1  VERY HIGH        8      0
18  prod_1  VERY HIGH        9      0
19  prod_1  VERY HIGH       10      0
Comments