Phdaml - 1 year ago 76
R Question

# create counter variable with Boolean condition using value from the previous row

I want to create a counter variable c based on the group variable user and True or False variable B.

``````DT <- data.table(time=c(1,2,3,1,1,2,3,1,1,1),user=c(1,1,1,2,3,3,3,4,4,5), B=c('t','f','t','f','f','t','t','t','t','t'))
DT
``````

The desired output of variable c

``````    time user B C
1:    1    1 t 1
2:    2    1 f 1
3:    3    1 t 2
4:    1    2 f 0
5:    1    3 f 0
6:    2    3 t 1
7:    3    3 t 2
8:    1    4 t 1
9:    2    4 t 2
10:    1    5 t 1
``````

variable c is a counter within the group when B is true. The logic (NOT code) of variable c is as follow. The sequence do matter as you can see from the time variable.

`````` if time=1 and b=='f' {c=0}
else
{
if b=='t'{c=previous[c]+1}
else {c=previous[c]}
}

#if there is no variable b, the counter can be created using dplyr:
group_by(user)%>%mutate(c=seq_along(user))
#or data.table
DT[, c := seq_len(.N), by = user]
# we can use data.table function shift() combined with for loop but i want to avoid for loop, it is slow and I have 300,000 rows.
``````

We group by 'user', `cumsum` the logical vector (`B=="t"`) and assign (`:=` ) the output to 'C'.

``````DT[, C:= cumsum(B=="t"), by = user]
DT
#    time user B C
# 1:    1    1 t 1
# 2:    2    1 f 1
# 3:    3    1 t 2
# 4:    1    2 f 0
# 5:    1    3 f 0
# 6:    2    3 t 1
# 7:    3    3 t 2
# 8:    1    4 t 1
# 9:    2    4 t 2
#10:    1    5 t 1
``````

The same logic can be applied to `dplyr` methods

``````library(dplyr)
DT %>%
group_by(user) %>%
mutate(C = cumsum(B == "t"))
``````
