ddunn801 - 6 months ago 32
R Question

# Identify records where a given sequence of events happens within x days

I have a large data.table, similar in structure to

`df`
:

``````library("data.table")
df <- data.frame(part = c("A", "B", "A", "C", "A", "D", "B", "D", "E"),
day = c(1, 2, 3, 4, 5, 6, 6, 7, 15),
code = c("S", "S", "P", "X", "P", "S", "P", "P", "P"))
setDT(df)
df
part day code
1:    A   1    S
2:    B   2    S
3:    A   3    P
4:    C   4    X
5:    A   5    P
6:    D   6    S
7:    B   6    P
8:    D   7    P
9:    E  15    P
``````

How can I add a column that flags records where
`code`
=
`S`
and the same
`part`
has
`code`
=
`P`
within 3 subsequent days? Expected result:

``````   part day code  flag
1:    A   1    S  TRUE
2:    B   2    S FALSE
3:    A   3    P FALSE
4:    C   4    X FALSE
5:    A   5    P FALSE
6:    D   6    S  TRUE
7:    B   6    P FALSE
8:    D   7    P FALSE
9:    E  15    P FALSE
``````

I think this does it

``````df[, v := FALSE ]
df[code == "S", v := !is.na(
df[code == "P"][df[code == "S"], on=c("part", "day"), roll=-3, which=TRUE]
)]

part day code     v
1:    A   1    S  TRUE
2:    B   2    S FALSE
3:    A   3    P FALSE
4:    C   4    X FALSE
5:    A   5    P FALSE
6:    D   6    S  TRUE
7:    B   6    P FALSE
8:    D   7    P FALSE
9:    E  15    P FALSE
``````

How it works `!is.na(x[i, which=TRUE])` tells us whether each row of `i` found a match in `x`. (This might break down if `i` finds multiple matches.) The `roll` part expands the scope of matches to cover rows that are nearby in terms of the last column being joined `on`.

I'm not sure that the `roll` value there is correct since I have never used it this way.