Thomas Speidel Thomas Speidel - 1 month ago 7
R Question

Identify Runs of Consecutive Obs by Group and Reshape

I'm trying to identify runs of consecutive observations, group them and reshape so that start and end of each run occupy a column. Visually:

Example

## REPRODUCIBLE EXAMPLE
> dput(example)
structure(list(id = c(123, 123, 123, 123, 123, 123, 123, 123,
234, 234, 234), date = structure(c(1398816000, 1398902400, 1398988800,
1399075200, 1399161600, 1350777600, 1350864000, 1350950400, 1470009600,
1470096000, 1470182400), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
event = structure(c(1L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 2L,
1L), .Label = c("0", "1"), class = "factor")), row.names = c(NA,
-11L), .Names = c("id", "date", "event"), class = c("tbl_df",
"tbl", "data.frame"))

## GLIMPSE DATA
> dplyr::glimpse(example)
Observations: 11
Variables: 3
$ id <dbl> 123, 123, 123, 123, 123, 123, 123, 123, 234, 234, 234
$ date <dttm> 2014-04-30, 2014-05-01, 2014-05-02, 2014-05-03, 2014-05-04, 2012-10-21, 2012-10-22, 2012-10-23, 2016-08-01, 2016-08-02, 2016-08-03
$ event <fctr> 0, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0


I've broken down the approach as follows:


  1. group data by
    id

  2. rle
    to identify runs of consecutive observations
    within
    id
    (e.g.
    rle(example$event > 0)
    )

  3. reshape from long to wide where min(date) and max(date) (within runs) become columns



I'm not sure how to proceed. The
data.table
solution to a similar question was close, but I wasn't able to re-purpose it.

Answer

Stealing the idea from the other post:

df1 %>% 
  mutate(eventGroup = data.table::rleid(event)) %>% 
  filter(event == 1) %>% 
  group_by(id, eventGroup) %>% 
  summarise(start = min(date),
            end = max(date))

#      id eventGroup      start        end
# 1   123          2 2014-05-01 2014-05-03
# 2   123          4 2012-10-22 2012-10-22
# 3   234          6 2016-08-02 2016-08-02
Comments