Noobie - 2 months ago 21

Python Question

I have a dataframe as follows

`df = pd.DataFrame({ 'X' : np.random.randn(50000)}, index=pd.date_range('1/1/2000', periods=50000, freq='T'))`

df.head(10)

Out[37]:

X

2000-01-01 00:00:00 -0.699565

2000-01-01 00:01:00 -0.646129

2000-01-01 00:02:00 1.339314

2000-01-01 00:03:00 0.559563

2000-01-01 00:04:00 1.529063

2000-01-01 00:05:00 0.131740

2000-01-01 00:06:00 1.282263

2000-01-01 00:07:00 -1.003991

2000-01-01 00:08:00 -1.594918

2000-01-01 00:09:00 -0.775230

I would like to create a variable that contains the

`sum`

- over the last 5 days (
**not including the current observation**) - only considering observations that fall at the exact same hour as the current observation.

In other words:

- At index ,
`2000-01-01 00:00:00`

contains the sum the values of X observed at`df['rolling_sum_same_hour']`

during the last 5 days in the data (not including`00:00:00`

of course).`2000-01-01`

- At index ,
`2000-01-01 00:01:00`

contains the sum of of X observed at`df['rolling_sum_same_hour']`

during the last 5 days and so on.`00:00:01`

The intuitive idea is that intraday prices have intraday seasonality, and I want to get rid of it that way.

I tried to use

`df['rolling_sum_same_hour']=df.at_time(df.index.minute).rolling(window=5).sum()`

with no success.

Any ideas?

Many thanks!

Answer

Behold the power of `groupby`

!

```
df = # as you defined above
df['rolling_sum_by_time'] = df.groupby(df.index.time)['X'].apply(lambda x: x.shift(1).rolling(10).sum())
```

It's a big pill to swallow there, but we are grouping by time (as in python datetime.time), then getting the column we care about (else apply will work on columns - it now works on the time-groups), and then applying the function you want!

Source (Stackoverflow)

Comments