Gustavo Gustavo - 3 months ago 11
R Question

R replace the current column (values) with random extreme values, lower than 12.5% and upper than 87.5%

I have a data set with 10 rows (values). Data for example:

value <- c(40.557669, 44.436873, 18.541628, 16.524613, 19.34,
10.07, 17.33, 20.155240, 15.31, 101.23,
)

data <- data.frame(value)


Using quantiles I can select values between the percentages 25%, 50%, 75%.

For example:

data$value <- data$value[data$value>=quantile(data$value)[4]]
newvalue <- data$value[data$value>=quantile(data$value)[4]]
data$value <- sample(newvalue, dim(data)[1], replace=T)


I would like to replace the current values with random extreme values, lower than 12.5% and upper than 87.5%.

how to do that best?

Thank you!

Answer

I was having issues with your provided dataset, so let's make this reproducible. Start with a data.frame with one column, value, of 50 random integers:

set.seed(4)
df <- data.frame(value = sample(1:100, 50))

Get the 12.5% and 87.5% ntiles:

ntiles <- quantile(df$value, probs = c(0.125, 0.875))
# ntiles
#  12.5%  87.5% 
# 19.625 85.500 

Now subset the data.frame into the lower extremes and upper extremes:

lowers <- subset(df, value < ntiles[1])
uppers <- subset(df, value > ntiles[2])

Finally, sample from the combined group of lowers$value and uppers$value:

sample(c(lowers$value, uppers$value), NROW(df), replace = T)

I used NROW(df) (which will be 50) to grab the same number of records from the original dataset.