user1357015 - 1 year ago 71
R Question

# Group Data By Trip in R

I have data in the following format:

``````   DeviceId               VIN Latitude Longitude            TrueTime
1 S5353532523              XY1  37.51874 -87.47483 2016-06-05 18:46:00
2 S5353532523              XY1  37.52975 -87.47588 2016-06-05 18:46:00
3 S5353532523              XY1  37.53472 -87.47734 2016-06-05 18:47:00
4 S5353532523              XY1  37.53769 -87.47846 2016-06-05 18:47:00
5 S5353532523              XY1  37.54271 -87.47963 2016-06-05 18:47:00
6 S5353532523              XY1  37.54780 -87.47942 2016-06-05 18:47:00
...
``````

I want to group this data into trips. Using dplyr I start with:

`````` Data %>% group_by(VIN, DeviceID) %>% ?
``````

However I'm curious what I should put in the question mark. Essentially I want to add a column that assigns a tripID from 1 onwards after the previous time increment is larger than 5 minutes.

So at some point, when the TrueTime changes by more than 5 minutes, the tripCounter goes up by 1. Also, it needs to increment over VINs and DeviceID as well (so the counter shouldn't reset to 1 at the start of each group).

We can use `difftime` to get the difference between the adjacent elements of each group, specify the `units` as `"mins"`, create a logical index, `cumsum` to create the 'TripID'

``````Data %>%
group_by(VIN, DeviceId) %>%
mutate(TripID = cumsum(c(TRUE, difftime(TrueTime[-1],
TrueTime[-n()], units = "mins")>5)))
``````

It is not clear about the resetting part. If this is based on 'TrueTime' greater than 5 min across the groups, we don't need the `group_by`

``````Data %>%
mutate(TripID = cumsum(c(TRUE, difftime(TrueTime[-1],
TrueTime[-nrow(Data)], units = "mins")>5)))
``````

Or if it needs to add after doing the `group_by`

``````Data %>%
group_by(VIN, DeviceId) %>%
mutate(TripID = cumsum(c(TRUE, difftime(TrueTime[-1],
TrueTime[-n()], units = "mins")>5))) %>%
ungroup() %>%
mutate(TripID = group_indices_(., .dots = c("VIN", "DeviceId"))-1 + TripID)
``````

NOTE: The 'TrueTime' class assumed is `POSIXct`

### data

``````Data <- structure(list(DeviceId = c("S5353532523", "S5353532523", "S5353532523",
"S5353532523", "S5353532523", "S5353532523", "S5353532523", "S5353532523",
"S5353532523", "S5353532523", "S5353532523", "S5353532523"),
VIN = c("XY1", "XY1", "XY1", "XY1", "XY1", "XY1", "XY2",
"XY2", "XY2", "XY2", "XY2", "XY2"), Latitude = c(37.51874,
37.52975, 37.53472, 37.53769, 37.54271, 37.5478, 37.51874,
37.52975, 37.53472, 37.53769, 37.54271, 37.5478), Longitude = c(-87.47483,
-87.47588, -87.47734, -87.47846, -87.47963, -87.47942, -87.47483,
-87.47588, -87.47734, -87.47846, -87.47963, -87.47942), TrueTime = structure(c(1465132560,
1465132560, 1465132620, 1465132620, 1465133040, 1465133040,
1465132560, 1465132560, 1465133100, 1465133160, 1465133160,
1465133160), class = c("POSIXct", "POSIXt"), tzone = "")), .Names = c("DeviceId",
"VIN", "Latitude", "Longitude", "TrueTime"), row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12"), class = "data.frame")
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download