imprela imprela - 10 months ago 88
R Question

Loop to change outliers of multiple variables to 95% in R

I have some outliers in my dataset. The variables of interest are named as

j_q3_1, j_q3_2,...,j_q3_14
and also
j_q4_1, j_q4_2,...,j_q4_14
. I want to change entries greater than the 95 percentile to the 95 percentile. I was wondering if I could create a loop that changes question number (q3 to q4) and also the last number after underscore (1 to 14). Any suggestions will be greatly appreciated.

Example data (only until _2 and q3 and q4 only):

test <- data.frame(hhid = c(1:5), j_q3_1 =c(1000,1500,2000,5000,10000), j_q4_1=c(500,100,200,10000,200), j_q5_1 =c(200,300,400,203,100), j_q3_2 =c(300,10000,200,300,200), j_q4_2=c(100,200,320,120,302), j_q5_2=c(10000,120,1222,300,2333))

This code works for me for every variable:

quantiles <- quantile(test$j_q3_1,c(0.95))
test$j_q3_1[test$j_q3_1 > quantiles[1]] <- quantiles[1]

quantiles <- quantile(test$j_q4_1,c(0.95))
test$j_q4_1[test$j_q4_1 > quantiles[1]] <- quantiles[1]

quantiles <- quantile(test$j_q3_2,c(0.95))
test$j_q3_2[test$j_q3_2 > quantiles[1]] <- quantiles[1]

quantiles <- quantile(test$j_q4_2,c(0.95))
test$j_q4_2[test$j_q3_2 > quantiles[1]] <- quantiles[1]

Answer Source

You can do it like this:

cname <- paste0("j_q", i, "_", j)
quantiles <- quantile(test[, cname], c(0.95))
test[test[, cname] > quantiles[1], cname] <- quantiles[1]

If you have NA values:

cname <- paste0("j_q", i, "_", j)
quantiles <- quantile(test[, cname], c(0.95), na.rm = TRUE)
test[test[![, cname]), cname] > quantiles[1], cname] <- quantiles[1]