Gerald T Gerald T - 10 days ago 6
R Question

r identify groups based on conditions

I am trying a clear way to identify groups based on condition on multiple column.

As an example imagine we have a table of points ordered by date and another numeric column:

DT <- data.table(a = paste0("Date", 1:10), V1 = c(1, 2, 11, 9, 11, 11, 2, 2, 11, 11))

a V1
1: Date1 1
2: Date2 2
3: Date3 11
4: Date4 9
5: Date5 11
6: Date6 11
7: Date7 2
8: Date8 2
9: Date9 11
10: Date10 11


Every time a value in
V1 > 10
we have a new group obtaining the following table:

a V1 ID
1: Date1 1 1
2: Date2 2 1
3: Date3 11 2
4: Date4 9 2
5: Date5 11 3
6: Date6 11 4
7: Date7 2 4
8: Date8 2 4
9: Date9 11 5
10: Date10 11 6


The tables are over 3million point so no loops please. Any help? Thank you very much!!!

Answer

We need cumsum on the logical vector (V1 > 10), add 1 to it in case it starts from 0 and assign (:=) to create the 'ID' column

DT[, ID := cumsum(V1>10)+1]
DT$ID
#[1] 1 1 2 2 3 4 4 4 5 6
Comments