user964689 - 1 year ago 211
R Question

# How to create a loop to repeat random sampling procedure in R

I have written some code in R to sample without replacement from 3 separate vectors (list1, list2, list3). I sample 10 times from list1, 20 times from list 2 and 30 times from list 3. I then combine the 3 lists of random samples and check how many times I have sampled the same string 2 or 3 times. How would I go about automating this so that I can sample 100 times and get a distribution of frequency counts? For example I want to see how frequently I randomly sample the same string from all three lists.

All input data are lists of thousands of strings like this:

list1:

``````     V1
[1,] "EDA"
[2,] "MGN2"
[3,] "5RSK"
[4,] "NBLN"
``````

My current code:

``````sample_list1 <-(sample(list1,10, replace=FALSE))
sample_list2 <-(sample(list2,20, replace=FALSE))
sample_list3 <-(sample(list3,20, replace=FALSE))

combined_randomgenes <- c(list1, list2, list3)
combined_counts <- as.data.frame(table(combined_randomgenes))

overlap_3_lists <- nrow(subset(combined_counts, Freq == 3))
overlap_2_lists <- nrow(subset(combined_counts, Freq == 2))
``````

If across my 3 random samples there was only 1 string that occurred in all 3 random samples then I would expect overlap_3_lists to contain the value 1. I would like to automate so that I get a distribution of values so that I can plot a histogram to see how many times there are 0, 1, 2, 3 etc identical strings that are sampled in all 3 lists.

You'll want to change 20 to 30 in your third sample. Also, your combined_randomgenes needs to reference the sample_listx. Then just put the for loop code around it and assign the results. Bonus tips: be wary of using `subset` in a script & set the seed so that your work is reproducible.

``````set.seed(1234)

list1 <- 1:60
list2 <- 1:60
list3 <- 1:60

n <- 100
runs <- data.frame(run=1:n,threes=NA,twos=NA)
for(i in 1:n) {
sample_list1 <-(sample(list1,10, replace=FALSE))
sample_list2 <-(sample(list2,20, replace=FALSE))
sample_list3 <-(sample(list3,30, replace=FALSE))

combined_randomgenes <- c(sample_list1, sample_list2, sample_list3)
combined_counts <- as.data.frame(table(combined_randomgenes))

runs\$threes[i] <- sum(combined_counts\$Freq==3)
runs\$twos[i] <- sum(combined_counts\$Freq==2)
}

runs
hist(runs\$threes,5)
hist(runs\$twos,5)
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download