Mártonka Mártonka - 2 years ago 71
R Question

Defining closest observation in a data table

I have a

data.table
like the following:

ID Date Flag_1 Flag_2
1 2016-01-01 1 0
1 2016-02-01 0 0
1 2016-03-01 0 1
1 2016-04-01 0 0
1 2016-05-01 1 0
1 2016-06-01 0 0
1 2016-07-01 0 1
2 2017-03-01 0 0
2 2017-04-01 1 0
2 2017-05-01 0 0
2 2017-06-01 0 0
2 2017-07-01 0 1


I would like to create a
Flag_1_date
variable, which links the date of the preceding flag_1 event to the flag_2 event.

It is important to me, that only those flag_1 dates should be linked to the flag_2 event, which has the same ID as the flag_2 event.

ID Date Flag_1 Flag_2 Flag_1_date
1 2016-01-01 1 0 0
1 2016-02-01 0 0 0
1 2016-03-01 0 1 2016-01-01
1 2016-04-01 0 0 0
1 2016-05-01 1 0 0
1 2016-06-01 0 0 0
1 2016-07-01 0 1 2016-05-01
2 2017-03-01 0 0 0
2 2017-04-01 1 0 0
2 2017-05-01 0 0 0
2 2017-06-01 0 0 2017-04-01
2 2017-07-01 0 1 0

Answer Source

You could use na.locf() from the zoo package to forward fill Date when Flag_1==1 by "ID", and subsequently omit the values where Flag_2==0:

library(zoo)
library(data.table)

dt[Flag_1==1, Flag_1_date := Date][
  ,Flag_1_date := na.locf(Flag_1_date), "ID"][
   Flag_2==0, Flag_1_date := NA
  ]
> dt
#   ID       Date Flag_1 Flag_2 Flag_1_date
# 1:  1 2016-01-01      1      0        <NA>
# 2:  1 2016-02-01      0      0        <NA>
# 3:  1 2016-03-01      0      1  2016-01-01
# 4:  1 2016-04-01      0      0        <NA>
# 5:  1 2016-05-01      1      0        <NA>
# 6:  1 2016-06-01      0      0        <NA>
# 7:  1 2016-07-01      0      1  2016-05-01
# 8:  2 2017-03-01      0      0        <NA>
# 9:  2 2017-04-01      1      0        <NA>
#10:  2 2017-05-01      0      0        <NA>
#11:  2 2017-06-01      0      0        <NA>
#12:  2 2017-07-01      0      1  2017-04-01
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download