Noobie - 1 year ago 63
R Question

# R: how to resample intraday data at the group level?

Consider the following dataframe

``````time <-c('2016-04-13 23:07:45','2016-04-13 23:07:50','2016-04-13 23:08:45','2016-04-13 23:08:45'
,'2016-04-13 23:08:45','2016-04-13 23:07:50','2016-04-13 23:07:51')
group <-c('A','A','A','B','B','B','B')
value<- c(5,10,2,2,NA,1,4)
df<-data.frame(time,group,value)

> df
time group value
1 2016-04-13 23:07:45     A     5
2 2016-04-13 23:07:50     A    10
3 2016-04-13 23:08:45     A     2
4 2016-04-13 23:08:45     B     2
5 2016-04-13 23:08:45     B    NA
6 2016-04-13 23:07:50     B     1
7 2016-04-13 23:07:51     B     4
``````

I would like to resample this dataframe at the
`5 seconds level`
-
`group level`
, and compute the sum of
`value`
for each
`time-interval`
-
`group value`
.

The interval should be closed on the left and open on the right. For instance, the first line of output should be

`2016-04-13 23:07:45 A 5`
because the first 5-sec interval is
`[2016-04-13 23:07:45, 2016-04-13 23:07:50[`

How can I do that in either
`dplyr`
or
`data.table`
? Do I need to import
`lubridate`
for the timestamps?

``````Group5 <- function(myDf) {
myDf\$time <- ymd_hms(myDf\$time)
myDf\$timeGroup <- floor_date(myDf\$time, unit = "5 seconds")
summarise(myDf %>% group_by(group, timeGroup), sum(value, na.rm = TRUE))
}

Group5(df)
Source: local data frame [5 x 3]
Groups: group [?]

group           timeGroup `sum(value, na.rm = TRUE)`
<fctr>              <dttm>                      <dbl>
1      A 2016-04-13 23:07:45                          5
2      A 2016-04-13 23:07:50                         10
3      A 2016-04-13 23:08:45                          2
4      B 2016-04-13 23:07:50                          5
5      B 2016-04-13 23:08:45                          2
``````

It takes advantage of `floor_date` and `ymd_hms` from `lubridate` to put each date time into the proper group-time.

Here is a more exotic example:

``````set.seed(500)
time <- ymd_hms('2016-04-13 23:07:45') + sample(-10^3:10^3, 10^5, replace=TRUE)
group <- rep(LETTERS[1:20], each = 5000)
value <- rep(NA, 10^5)
value[sample(10^5, 95000)] <- sample(100, 95000, replace=TRUE)
df2 <- data.frame(time,group,value)

time group value
1 2016-04-13 23:18:53     A    53
2 2016-04-13 23:15:15     A    NA
3 2016-04-13 23:23:36     A    40
4 2016-04-13 23:06:40     A    23
5 2016-04-13 23:18:10     A    74
6 2016-04-13 22:57:56     A    65
``````

Calling it we have:

``````Group5(df2)
Source: local data frame [8,020 x 3]
Groups: group [?]

group           timeGroup `sum(value, na.rm = TRUE)`
<fctr>              <dttm>                      <int>
1       A 2016-04-13 22:51:05                        379
2       A 2016-04-13 22:51:10                        646
3       A 2016-04-13 22:51:15                        391
4       A 2016-04-13 22:51:20                       1118
5       A 2016-04-13 22:51:25                        745
6       A 2016-04-13 22:51:30                        546
7       A 2016-04-13 22:51:35                        884
8       A 2016-04-13 22:51:40                        711
9       A 2016-04-13 22:51:45                        526
10      A 2016-04-13 22:51:50                        484
# ... with 8,010 more rows
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download