imprela - 26 days ago 4x

R Question

I have some outliers in my dataset. The variables of interest are named as

`j_q3_1, j_q3_2,...,j_q3_14`

`j_q4_1, j_q4_2,...,j_q4_14`

Example data (only until _2 and q3 and q4 only):

`test <- data.frame(hhid = c(1:5), j_q3_1 =c(1000,1500,2000,5000,10000), j_q4_1=c(500,100,200,10000,200), j_q5_1 =c(200,300,400,203,100), j_q3_2 =c(300,10000,200,300,200), j_q4_2=c(100,200,320,120,302), j_q5_2=c(10000,120,1222,300,2333))`

This code works for me for every variable:

`quantiles <- quantile(test$j_q3_1,c(0.95))`

test$j_q3_1[test$j_q3_1 > quantiles[1]] <- quantiles[1]

quantiles <- quantile(test$j_q4_1,c(0.95))

test$j_q4_1[test$j_q4_1 > quantiles[1]] <- quantiles[1]

quantiles <- quantile(test$j_q3_2,c(0.95))

test$j_q3_2[test$j_q3_2 > quantiles[1]] <- quantiles[1]

quantiles <- quantile(test$j_q4_2,c(0.95))

test$j_q4_2[test$j_q3_2 > quantiles[1]] <- quantiles[1]

Answer

You can do it like this:

```
cname <- paste0("j_q", i, "_", j)
quantiles <- quantile(test[, cname], c(0.95))
test[test[, cname] > quantiles[1], cname] <- quantiles[1]
```

If you have NA values:

```
cname <- paste0("j_q", i, "_", j)
quantiles <- quantile(test[, cname], c(0.95), na.rm = TRUE)
test[test[!is.na(test[, cname]), cname] > quantiles[1], cname] <- quantiles[1]
```

Source (Stackoverflow)

Comments