Monish Biswas Monish Biswas - 1 month ago 7
R Question

How can I aggregate close time events in R

I need to be able a data frame with events, and output the start, end and count of runs, where runs are where the events are less than a specified time period together.

The data.frame rows are already sorted by time

e.g.

library(lubridate)

ts <- c("2016-10-28 19:21:19",
"2016-10-28 19:21:20",
"2016-10-28 19:21:21",
"2016-10-28 19:21:21",
"2016-10-28 19:23:23",
"2016-10-28 19:23:24",
"2016-10-28 19:23:24",
"2016-10-28 19:23:25",
"2016-10-30 03:59:09",
"2016-10-30 08:54:31",
"2016-10-30 08:54:35"
)

df <- data.frame(time=ymd_hms(ts))


What I would like outputted is a data frame like this, where the interval is 60s from the previous event

start end count
2016-10-28 19:21:19 2016-10-28 19:21:21 4
2016-10-28 19:23:23 2016-10-28 19:23:25 4
2016-10-30 03:59:09 2016-10-30 03:59:09 1
2016-10-30 08:54:31 2016-10-30 08:54:35 2


The actual sequences would be very long, so the solution would need to perform well with a lot (~100k) rows

I've looked at
lag
,
diff
and other functions, but cannot see an easy or efficient way of doing this.

Answer

Here is code using dplyr.

First, it sorts by time just in case they are out of order. Then, it calculates a timeChange using difftime to ensure it returns in seconds and determines if it isBigChange (here, greater than 60 seconds). Then, it uses each TRUE to increment the group number using cumsum (counts each TRUE as a 1). It then group_bys that group, and calculates the summaries you wanted.

df %>%
  arrange(time) %>%
  mutate(timeChange = difftime(time, lag(time, default = time[1])
                               , units = "secs")
         , isBigChange = timeChange > 60
         , group = cumsum(isBigChange)) %>%
  group_by(group) %>%
  summarise(
    start = min(time)
    , end = max(time)
    , count = n()
  )

Returns

  group               start                 end count
  <int>              <dttm>              <dttm> <int>
1     0 2016-10-28 19:21:19 2016-10-28 19:21:21     4
2     1 2016-10-28 19:23:23 2016-10-28 19:23:25     4
3     2 2016-10-30 03:59:09 2016-10-30 03:59:09     1
4     3 2016-10-30 08:54:31 2016-10-30 08:54:35     2