hmmmbob hmmmbob - 3 months ago 36
R Question

using replicate central limit t

doing a course and the following is the question.

library(downloader)
url <- "https://raw.githubusercontent.com/genomicsclass/dagdata/master/inst/extdata/femaleMiceWeights.csv"
filename <- "femaleMiceWeights.csv"
if(!file.exists("femaleMiceWeights.csv")) download(url,destfile=filename)
dat <- read.csv(filename)


Suppose we are interested in the proportion of times we see a 6 when rolling n=100 die. This is a random variable which we can simulate with x=sample(1:6, n, replace=TRUE) and the proportion we are interested in can be expressed as an average: mean(x==6). Because the die rolls are independent, the CLT applies.

We want to roll n dice 10,000 times and keep these proportions. This random variable (proportion of 6s) has mean p=1/6 and variance p*(1-p)/n. So according to CLT z = (mean(x==6) - p) / sqrt(p*(1-p)/n) should be normal with mean 0 and SD 1. Set the seed to 1, then use replicate to perform the simulation, and report what proportion of times z was larger than 2 in absolute value (CLT says it should be about 0.05).

So i wrote the following :

set.seed(1)
n<-10000
p<-1/6
a<-replicate(n, {
x=sample(1:6, n, replace=TRUE)
z<-(mean(x==6) - p) / sqrt(p*(1-p)/n)

})
> mean(abs(a)>2)
[1] 0.0472


So its wrong but pretty close, anyone see where i went wrong ?

Answer

My answer is marked wrong :( that is why i asked.. so i must have done something wrong :(

Looks like you are doing a homework or something, then you were marked wrong, because your code does not really do what your question asks:

We want to roll n dice 10,000 times and keep these proportions

This is the correct version; don't put n inside sample().

set.seed(1); n <- 10000; p <- 1/6
a <- replicate(n, (mean(sample(1:6, 10000, replace=TRUE)==6) - p) / sqrt(p*(1-p)/10000))

Also, you are not comparing with the right thing.

CLT says it should be about 0.05

No! The theoretic quantity you need compare with is 2 * pnorm(-2), which is 0.04550026. The corresponding quantile for 0.05 is -qnorm(0.025) which is 1.96 not 2.

Let's check what mean(abs(a) > 2) gives.

  • With n = 10000 and set.seed(1), you get 0.0472, already close to true value;
  • With n = 20000 and set.seed(1), you get 0.04565, even closer!
Comments