dimebucker91 dimebucker91 - 2 months ago 6
R Question

creating row index based on time difference in R

I have data that looks like:

player event diff
A x NA
A y 2
A z 240
A w 3
A x 9
B x NA
B y 3
B z 120
C x NA
C x 8


What I did to get this was to group by the player column and take the difference between time events, hence the NA's for diff column whenever a new player has an event.

What I want to do is to partition the data into player specific interactions that are within a few minutes of each other (say a cutoff of diff = 20). What I want in the end is to have:

player event diff interaction
A x NA 1
A y 2 1
A z 240 2
A w 3 2
A x 9 2
B x NA 1
B y 3 1
B z 120 2
C x NA 1
C x 8 1


So basically the interactions are grouped based on having the same player and the difference being less than 20, otherwise a new interaction is started. A new interaction is also started if an NA is present. I'm not really sure how to do this in a fast/efficient way as I've got a large data set with many players. My preference is for a
dplyr
solution

Answer

You can replace NA with 0(or other number that is below your threshold) using coalesce in the diff column and do a cumsum on the diff >= 20 condition, which will give a distinct id whenever diff exceed some threshold:

library(dplyr)
df %>% group_by(player) %>% 
       mutate(interaction = cumsum(coalesce(diff, 0L) >= 20) + 1)

# Source: local data frame [10 x 4]
# Groups: player [3]

#    player  event  diff interaction
#     <fctr> <fctr> <int>       <dbl>
# 1       A      x    NA           1
# 2       A      y     2           1
# 3       A      z   240           2
# 4       A      w     3           2
# 5       A      x     9           2
# 6       B      x    NA           1
# 7       B      y     3           1
# 8       B      z   120           2
# 9       C      x    NA           1
# 10      C      x     8           1
Comments