bluesummers - 3 years ago 170
Python Question

# Summing values across given range of days difference backwards - Pandas

I have created a days difference column in a pandas dataframe, and I'm looking to add a column that has the sum of a specific value over a given days window backwards

Notice that I can supply a date column for each row if it is needed, but the diff was created as days difference from the first day of the data.

Example

``````df = pd.DataFrame.from_dict({'diff': [0,0,1,2,2,2,2,10,11,15,18],
'value': [10,11,15,2,5,7,8,9,23,14,15]})
df
Out[12]:
diff  value
0      0     10
1      0     11
2      1     15
3      2      2
4      2      5
5      2      7
6      2      8
7     10      9
8     11     23
9     15     14
10    18     15
``````

`5_days_back_sum`
column that will sum the past 5 days, including same day so the result would be like this

``````Out[15]:
5_days_back_sum  diff  value
0                21     0     10
1                21     0     11
2                36     1     15
3                58     2      2
4                58     2      5
5                58     2      7
6                58     2      8
7                 9    10      9
8                32    11     23
9                46    15     14
10               29    18     15
``````

How can I achieve that? Originally I have a date column to create the diff column, if that helps its available

Use custom function with `boolean indexing` for filtering range with `sum`:

``````def f(x):
return df.loc[(df['diff'] >= x - 5) & (df['diff'] <= x), 'value'].sum()

df['5_days_back_sum'] = df['diff'].apply(f)
print (df)
diff  value  5_days_back_sum
0      0     10               21
1      0     11               21
2      1     15               36
3      2      2               58
4      2      5               58
5      2      7               58
6      2      8               58
7     10      9                9
8     11     23               32
9     15     14               46
10    18     15               29
``````

Similar solution with `between`:

``````def f(x):
return df.loc[df['diff'].between(x - 5, x), 'value'].sum()

df['5_days_back_sum'] = df['diff'].apply(f)
print (df)
diff  value  5_days_back_sum
0      0     10               21
1      0     11               21
2      1     15               36
3      2      2               58
4      2      5               58
5      2      7               58
6      2      8               58
7     10      9                9
8     11     23               32
9     15     14               46
10    18     15               29
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download