Zico Zico - 1 month ago 8
R Question

Moving average of past 5-minutes data against each tick

This is my Data

times value size return
2016-06-01 9:07:11 14.2 595 0
2016-06-01 9:08:11 14.2 2505 0.003527341
2016-06-01 9:11:03 14.15 1 0
2016-06-01 9:13:03 14.15 2200 0.003527341
2016-06-01 9:15:04 14.2 480 0
2016-06-01 9:15:04 14.2 2965 0.003527341
2016-06-01 9:15:05 14.2 144 0
2016-06-01 9:20:05 14.2 1856 0.003514942
2016-06-01 9:22:06 14.25 300 0
2016-06-01 9:25:06 14.25 856 0.003514942


I want to create another column
"5min_Ret"
for each tick, whose value should be
last 5 mins average of return
. Below is the desired output with logic of calculation mentioned at the end of each row. Logic Column is just to explain here. It will be not added in final output.

times value size return 5min_Ret Logic
2016-06-01 9:07:11 14.2 595 0 0 First Tick 0
2016-06-01 9:08:11 14.2 2505 0.003527341 0.001763671 Avg of 1 to 2
2016-06-01 9:11:03 14.15 1 0 0.00117578 Avg of 1 to 3
2016-06-01 9:13:03 14.15 2200 0.003527341 0.002351561 Avg of 2 to 4
2016-06-01 9:15:04 14.2 480 0 0.00117578 Avg of 3 to 5
2016-06-01 9:15:04 14.2 2965 0.003527341 0.001763671 Avg of 3 to 6
2016-06-01 9:15:05 14.2 144 0 0.001410936 Avg of 3 to 7
2016-06-01 9:20:05 14.2 1856 0.003514942 0.001757471 Avg of 7 to 8
2016-06-01 9:22:06 14.25 300 0 0.001757471 Avg of 8 to 9
2016-06-01 9:25:06 14.25 856 0.003514942 0.001757471 Avg of 9 to 10


I assume
dplyr
package will be useful for group by. But for each tick I am not able to be successful to get group by data by interval 5 mins. Any suggestion/help in R is appreciated.

Thanks.

Answer

You can achieve this with sapply. Let's say your object is called df:

df$'5min_ret' <- sapply( X = seq_along( df$return ), 
                        FUN = function(x) { 
                            mean( df$return[ df$times >= df$times[x] - 5*60 & 
                                                   df$times <= df$times[x] ] ) 
                        } )

Note the seq_along call is simply creating a vector sequence the same length as the number of rows in your data frame (in your case, 10).

The function defined after FUN is what's important. That function takes a subset of the dataframe where the time is within the last 5 mins (greater than 5mins ago, and less than now), and takes the mean of the return column that's left. sapply simply runs that function for every value of X (that's our 1:10 sequence).

NOTE however that calling your column 5min_ret is generally not a great idea, since R doesn't particularly like names of that form. I've surrounded it in quotations on creation to get around that, but I'd recommend thinking of a different name.