ddunn801 ddunn801 - 14 days ago 6
R Question

Identify records where a given sequence of events happens within x days

I have a large data.table, similar in structure to

df
:

library("data.table")
df <- data.frame(part = c("A", "B", "A", "C", "A", "D", "B", "D", "E"),
day = c(1, 2, 3, 4, 5, 6, 6, 7, 15),
code = c("S", "S", "P", "X", "P", "S", "P", "P", "P"))
setDT(df)
df
part day code
1: A 1 S
2: B 2 S
3: A 3 P
4: C 4 X
5: A 5 P
6: D 6 S
7: B 6 P
8: D 7 P
9: E 15 P


How can I add a column that flags records where
code
=
S
and the same
part
has
code
=
P
within 3 subsequent days? Expected result:

part day code flag
1: A 1 S TRUE
2: B 2 S FALSE
3: A 3 P FALSE
4: C 4 X FALSE
5: A 5 P FALSE
6: D 6 S TRUE
7: B 6 P FALSE
8: D 7 P FALSE
9: E 15 P FALSE

Answer

I think this does it

df[, v := FALSE ]
df[code == "S", v := !is.na(
  df[code == "P"][df[code == "S"], on=c("part", "day"), roll=-3, which=TRUE]
)]

   part day code     v
1:    A   1    S  TRUE
2:    B   2    S FALSE
3:    A   3    P FALSE
4:    C   4    X FALSE
5:    A   5    P FALSE
6:    D   6    S  TRUE
7:    B   6    P FALSE
8:    D   7    P FALSE
9:    E  15    P FALSE

How it works !is.na(x[i, which=TRUE]) tells us whether each row of i found a match in x. (This might break down if i finds multiple matches.) The roll part expands the scope of matches to cover rows that are nearby in terms of the last column being joined on.

I'm not sure that the roll value there is correct since I have never used it this way.