hmmmbob - 8 months ago 58

R Question

doing a course and the following is the question.

`library(downloader)`

url <- "https://raw.githubusercontent.com/genomicsclass/dagdata/master/inst/extdata/femaleMiceWeights.csv"

filename <- "femaleMiceWeights.csv"

if(!file.exists("femaleMiceWeights.csv")) download(url,destfile=filename)

dat <- read.csv(filename)

Suppose we are interested in the proportion of times we see a 6 when rolling n=100 die. This is a random variable which we can simulate with x=sample(1:6, n, replace=TRUE) and the proportion we are interested in can be expressed as an average: mean(x==6). Because the die rolls are independent, the CLT applies.

We want to roll n dice 10,000 times and keep these proportions. This random variable (proportion of 6s) has mean p=1/6 and variance p*(1-p)/n. So according to CLT z = (mean(x==6) - p) / sqrt(p*(1-p)/n) should be normal with mean 0 and SD 1. Set the seed to 1, then use replicate to perform the simulation, and report what proportion of times z was larger than 2 in absolute value (CLT says it should be about 0.05).

So i wrote the following :

`set.seed(1)`

n<-10000

p<-1/6

a<-replicate(n, {

x=sample(1:6, n, replace=TRUE)

z<-(mean(x==6) - p) / sqrt(p*(1-p)/n)

})

> mean(abs(a)>2)

[1] 0.0472

So its wrong but pretty close, anyone see where i went wrong ?

Answer

My answer is marked wrong :( that is why i asked.. so i must have done something wrong :(

Looks like you are doing a homework or something, then you were marked wrong, because your code does not really do what your question asks:

We want to roll n dice 10,000 times and keep these proportions

This is the correct version; don't put `n`

inside `sample()`

.

```
set.seed(1); n <- 10000; p <- 1/6
a <- replicate(n, (mean(sample(1:6, 10000, replace=TRUE)==6) - p) / sqrt(p*(1-p)/10000))
```

Also, you are not comparing with the right thing.

CLT says it should be about 0.05

No! The theoretic quantity you need compare with is `2 * pnorm(-2)`

, which is `0.04550026`

. The corresponding quantile for `0.05`

is `-qnorm(0.025)`

which is `1.96`

not `2`

.

Let's check what `mean(abs(a) > 2)`

gives.

- With
`n = 10000`

and`set.seed(1)`

, you get`0.0472`

, already close to true value; - With
`n = 20000`

and`set.seed(1)`

, you get`0.04565`

, even closer!