Stat Stat - 3 months ago 9
R Question

Check overlap begin and end time by group in R

I want to check overlap of data, here is data

ID <- c(rep(1,3), rep(3, 5), rep(4,4),rep(5,5))
Begin <- c(0,2.5,3,7,8,7,25,25,10,15,17,20,1,NA,10,11,13)
End <- c(1.5,3.5,6,12,8,11,29,35, 12,19,NA,28,5,20,30,20,25)
df <- data.frame(ID, Begin, End)
df
ID Begin End
1 1 0.0 1.5
2 1 2.5 3.5
3 1 3.0 6.0*
4 3 7.0 12.0
5 3 8.0 8.0*
6 3 7.0 11.0*
7 3 25.0 29.0
8 3 25.0 35.0*
9 4 10.0 12.0
10 4 15.0 19.0
11 4 17.0 NA*
12 4 20.0 28.0
13 5 1.0 5.0
14 5 NA 20.0
15 5 10.0 30.0
16 5 11.0 20.0*
17 5 13.0 25.0*


*
means it's overlap:


  • for row 3,ID = 1,Begin=3.0 is smaller than 3.5, so set Begin_New = 3.5, but

  • while ID = 3, it's different, row 5 Begin = 8.0 is smaller than 12.0, we set Begin_New = 12, but it keep going, if we compare Begin = 7.0 with End = 8.0, it's not correct, because now End is 12 is higher next value.



So here is my output design

ID Begin End Begin_New1
1 1 0.0 1.5 0.0
2 1 2.5 3.5 2.5
3 1 3.0 6.0 3.5*
4 3 7.0 12.0 7.0
5 3 8.0 8.0 12.0*
6 3 7.0 11.0 12.0*
7 3 25.0 29.0 25.0
8 3 25.0 35.0 29.0*
9 4 10.0 12.0 10.0
10 4 15.0 19.0 15.0
11 4 17.0 NA 19.0*
12 4 20.0 28.0 20.0
13 5 1.0 5.0 1.0
14 5 NA 20.0 NA
15 5 10.0 30.0 20.0*
16 5 11.0 20.0 30.0*
17 5 13.0 25.0 30.0*


When I use this code, I don't get the output I want, it shift only 1 row and compare each row

setDT(df)[, Begin_New := shift(End), by = ID][!which(Begin < Begin_New), Begin_New:= Begin]
ID Begin End Begin_New
1: 1 0.0 1.5 0.0
2: 1 2.5 3.5 2.5
3: 1 3.0 6.0 3.5
4: 3 7.0 12.0 7.0
5: 3 8.0 8.0 12.0
6: 3 7.0 11.0 8.0
7: 3 25.0 29.0 25.0
8: 3 25.0 35.0 29.0
9: 4 10.0 12.0 10.0
10: 4 15.0 19.0 15.0
11: 4 17.0 NA 19.0
12: 4 20.0 28.0 20.0
13: 5 1.0 5.0 1.0
14: 5 NA 20.0 NA
15: 5 10.0 30.0 20.0
16: 5 11.0 20.0 30.0
17: 5 13.0 25.0 20.0


This is the output I don't want it

Answer

I think your code is pretty much right, you just need to use cummax:

df[, Begin_New := {
  high_so_far = shift(cummax(End), fill=Begin[1L])
  w           = which(Begin < high_so_far)
  Begin[w]    = high_so_far[w]

  Begin
}, by=ID]