cimentadaj cimentadaj - 1 month ago 5
R Question

Dynamically replace values of a column based on available values from another column

Suppose I have this data frame

set.seed(2)
df <- data.frame(c1 = sample(c(0:3,NA), 50, replace = T), c2 = sample(c(0:3,NA), 50, replace = T),
c3 = sample(c(0:3,NA), 50, replace = T), c4 = sample(c(0:3,NA), 50, replace = T))

head(df)
c1 c2 c3 c4
1 0 0 1 0
2 3 0 2 1
3 2 3 NA NA
4 0 NA NA 1
5 NA 1 1 3
6 NA NA 2 1


When c4 is 0, I'd like to replace it with the next available non-NA value in c3. If c3 is NA, then c2 and so on.

I'm trying to learn how to do it, so don't just throw in the answer! If it's alright, suggest possible solutions. Thanks in advance.

Edit:

Expected output:

head(df)
c1 c2 c3 c4
1 0 0 1 1 # This would be the only difference with the head output from above
2 3 0 2 1
3 2 3 NA NA
4 0 NA NA 1
5 NA 1 1 3
6 NA NA 2 1

Answer

This is how you can do it without looping through each row:

c4 <- ncol(df)
inds <- max.col(!is.na(df[,-c4]) & df[,-c4]!=0, "last")
zeroinds <- which((df[,c4]==0)==T)
df[zeroinds,c4] <- df[cbind(zeroinds,inds[zeroinds])]

head(df, 10)

   # c1 c2 c3 c4
# 1   0  0  1  1
# 2   3  0  2  1
# 3   2  3 NA NA
# 4   0 NA NA  1
# 5  NA  1  1  3
# 6  NA NA  2  1
# 7   0  3 NA NA
# 8  NA NA  2  2
# 9   2  3  0  3
# 10  2  3  0  1

Here is how:

  1. c4 as the last column
  2. We find the first non-NA and non-zero value per row before c4
  3. Find those rows with zero in c4 and put it in zeroinds
  4. Replace zeros at zeroinds with the first non-NA and non-zero value per row