Thomas Speidel - 1 month ago 7
R Question

# Identify Runs of Consecutive Obs by Group and Reshape

I'm trying to identify runs of consecutive observations, group them and reshape so that start and end of each run occupy a column. Visually:

``````## REPRODUCIBLE EXAMPLE
> dput(example)
structure(list(id = c(123, 123, 123, 123, 123, 123, 123, 123,
234, 234, 234), date = structure(c(1398816000, 1398902400, 1398988800,
1399075200, 1399161600, 1350777600, 1350864000, 1350950400, 1470009600,
1470096000, 1470182400), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
event = structure(c(1L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 2L,
1L), .Label = c("0", "1"), class = "factor")), row.names = c(NA,
-11L), .Names = c("id", "date", "event"), class = c("tbl_df",
"tbl", "data.frame"))

## GLIMPSE DATA
> dplyr::glimpse(example)
Observations: 11
Variables: 3
\$ id    <dbl> 123, 123, 123, 123, 123, 123, 123, 123, 234, 234, 234
\$ date  <dttm> 2014-04-30, 2014-05-01, 2014-05-02, 2014-05-03, 2014-05-04, 2012-10-21, 2012-10-22, 2012-10-23, 2016-08-01, 2016-08-02, 2016-08-03
\$ event <fctr> 0, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0
``````

I've broken down the approach as follows:

1. group data by
`id`

2. `rle`
to identify runs of consecutive observations
within
`id`
(e.g.
`rle(example\$event > 0)`
)

3. reshape from long to wide where min(date) and max(date) (within runs) become columns

I'm not sure how to proceed. The
`data.table`
solution to a similar question was close, but I wasn't able to re-purpose it.

Stealing the idea from the other post:

``````df1 %>%
mutate(eventGroup = data.table::rleid(event)) %>%
filter(event == 1) %>%
group_by(id, eventGroup) %>%
summarise(start = min(date),
end = max(date))

#      id eventGroup      start        end
# 1   123          2 2014-05-01 2014-05-03
# 2   123          4 2012-10-22 2012-10-22
# 3   234          6 2016-08-02 2016-08-02
``````