Israel Israel - 1 month ago 17
R Question

Error creating samples with mapply

I have a dataframe like this:

df <- data.frame(size_upms = c(126, 123, 148),
electric_mean = c(0.716756756756757,0.647859922178988, 0.726313694267516),
gas_mean = c(0.273513513513513,0.322679266259033, 0.259554140127389),
firewood_mean = c(0, 0.00111172873818788,0.00179140127388535))

# df
# size_upms electric_mean gas_mean firewood_mean
#1 126 0.7167568 0.2735135 0.000000000
#2 123 0.6478599 0.3226793 0.001111729
#3 148 0.7263137 0.2595541 0.001791401


I want to obtain samples using parameters for each row using mapply

l <- mapply(sample,c("electric","gas","firewood"),df$size_upms,TRUE,
c(df$electric_mean,df$gas_mean,df$firewood_mean))


But I get this error:

#Error in sample.int(length(x), size, replace, prob) :
# too few positive probabilities


However if I apply sample function to each row, it works:

sample(c("electric","gas","firewood"),df$size_upms[1],TRUE,
c(df$electric_mean[1],df$gas_mean[1],df$firewood_mean[1]))[1:5]
#[1] "gas" "electric" "electric" "gas" "electric"
sample(c("electric","gas","firewood"),df$size_upms[2],TRUE,
c(df$electric_mean[2],df$gas_mean[1],df$firewood_mean[2]))[1:5]
#[1] "electric" "gas" "gas" "gas" "electric"
sample(c("electric","gas","firewood"),df$size_upms[3],TRUE,
c(df$electric_mean[3],df$gas_mean[3],df$firewood_mean[1]))[1:5]
#[1] "electric" "electric" "gas" "electric" "electric"


But I want to use mapply because I want to apply it to a big dataframe

What am I doing wrong?

Answer

As it is by rows, it is easier to do with apply or lapply. There won't much difference in performance between mapply or other apply solutions

lapply(seq_len(nrow(df)), function(i) 
    sample(c("electric","gas","firewood"), df$size_upms[i], TRUE, 
    unlist(c(df$electric_mean[i],df$gas_mean[i],df$firewood_mean[i]))))

The error in OP's solution is the concatenation process. Here, we pass the arguments as separate columns from the dataset and then in the anonymous function call, do the concatenation. This will make sure that for each step, the corresponding row element from the column is selected.

Map(function(x,y, u, w) sample(c("electric","gas","firewood"), x, 
     TRUE, c(y, u, w)), df$size_upms, df$electric_mean, df$gas_mean, df$firewood_mean)
Comments