Boris Boris - 3 months ago 9
R Question

How to generate number of prior year an individual had its currect X score?

I want to simply generate a variable that counts number of prior years an individual had its current score.

For example, if we look at personID from this reproducible example

set.seed(987)
mydata <- data.frame(
personID = rep(c(1:10), each = 10),
year= rep(c(1991:2000), each = 1),
score = sample(c(0, 1, 2), 100, replace = TRUE)
)


Here is the personI== 5

personID year score
5 1991 2
5 1992 1
5 1993 0
5 1994 0
5 1995 0
5 1996 0
5 1997 2
5 1998 0
5 1999 1
5 2000 1


What I want to generate is variable Z and it should look like this

personID year score Z
5 1991 2 0
5 1992 1 0
5 1993 0 0
5 1994 0 1
5 1995 0 2
5 1996 0 3
5 1997 2 0
5 1998 0 0
5 1999 1 0
5 2000 1 1


I have been trying do this with this code

mydata1 <- with(mydata, ave(score, personID, FUN=
function(x) cumsum(c(TRUE, diff(x)<0))))
mydata$Z <- with(mydata, ave(mydata1, mydata1, personID, FUN= seq_along)-1)


It doesn't do the work, I need to specify somehow that what I want to count is the current score (number of years when an individual had score 0, 1 or 2). The
diff(x)<0
part is also wrong - I tried different things and at the end it was impossible to remove it somehow..

Answer

Here's a possible solution using run-length encoding using data.table for convenience

library(data.table)
setDT(mydata)[, Z := 1:.N - 1L, by = .(personID, rleid(score))]

# Check results
mydata[personID == 5]
#     personID year score Z
#  1:        5 1991     2 0
#  2:        5 1992     1 0
#  3:        5 1993     0 0
#  4:        5 1994     0 1
#  5:        5 1995     0 2
#  6:        5 1996     0 3
#  7:        5 1997     2 0
#  8:        5 1998     0 0
#  9:        5 1999     1 0
# 10:        5 2000     1 1

Or using the development version (v>=1.9.7) you could enhance it using rowid

setDT(mydata)[, Z := rowid(score) - 1L, by = .(personID, rleid(score))]