MikeD MikeD - 1 month ago 6
R Question

Create a data column in an R dataframe based on a sequential comparison

I am having an issue trying to generate values based off of a sequential conditional comparison from values in another column of a data frame.

Here are example Data:

ID Tracked

1 Yes
1 Yes
1 No
1 No
1 Yes
1 Yes
2 Yes
2 No
2 No
2 Yes
2 Yes
2 Yes
2 No
2 Yes


What I would like is to add a third column based off of whether ID has at least a previous and successive Yes like this.

ID Tracked Seq

1 Yes NA
1 Yes 1
1 No 0
1 No 0
1 Yes 0
1 Yes 1
2 Yes NA
2 No 0
2 No 0
2 Yes 0
2 Yes 1
2 Yes 1
2 No 0
2 Yes 0


Many thanks for all your help. I have tried numerous things and can not seem to come up with simple code but it is looking like I will need to construct a nested for loop.

The latest for loop I have tried (without looping over per ID yet as a test) is not working, just returning all zeros.

for (i in length(Data)){
Data$Seq[i+1] <- ifelse(Data$Tracked[i]==Data$Tracked[i+1], 1, 0)
}


And the solution that works with the help of @parksw3

for (i in 1:(nrow(df)-1)){
df$Seq[1] <- NA
df$Seq[i+1] <-
ifelse(df$Tracked[i]==df$Tracked[i+1], 1, 0) *
ifelse(df$ID[i] == df$ID[i+1], 1, NA)
}

Answer

I think this does what you want to do.

df <- read.table(
    text = "
    ID  Tracked

    1   Yes
    1   Yes
    1   No
    1   No
    1   Yes
    1   Yes
    2   Yes
    2   No
    2   No
    2   Yes
    2   Yes
    2   Yes
    2   No
    2   Yes",
    header = TRUE
)

tmp gives you a TRUE/FALSE vector that tells you whether the answer is yes or not. tmp2 gives you a TRUE/FALSE vector that tells you whether the previous answer is yes or not, by adding NA to the beginning and removing the last element. Multiplying two will return 1 if both the previous and the current answers are yes and 0 otherwise.

dftmp <- aggregate(.~ID, df,
    FUN = function(x){
        tmp <- x == 2 ## If you want to use "Yes" instead, replace df with lapply(df, as.character) and replace 2 with "Yes"
        tmp2 <- c(NA, tmp[-length(tmp)])
        tmp * tmp2
    })

df$Seq <- unlist(dftmp[,-1])

Result:

print(df)
##    ID Tracked Seq
## 1   1     Yes  NA
## 2   1     Yes   1
## 3   1      No   0
## 4   1      No   0
## 5   1     Yes   0
## 6   1     Yes   1
## 7   2     Yes  NA
## 8   2      No   0
## 9   2      No   0
## 10  2     Yes   0
## 11  2     Yes   1
## 12  2     Yes   1
## 13  2      No   0
## 14  2     Yes   0

Also, to fix your code, it's somewhat sloppy but this should work:

for (i in 1:(nrow(df)-1)){
    df$Seq[i+1] <- 
        ifelse(df$Tracked[i]==df$Tracked[i+1], 1, 0) *
        ifelse(df$ID[i] == df$ID[i+1], 1, NA)
}