Jennifer - 9 months ago 22

R Question

I want to simulate the problem below in R and calculate the average probability based on 1000 simulations -

*Scores on a test are normally distributed with mean 70 and std dev 10.
Estimate the probability that among 75 randomly selected students at least 22 score greater than 78*

This is what I have done so far

`set.seed(1)`

scores = rnorm(1000,70,10)

head(scores)

hist(scores)

sm75=sample(scores,75)

length(sm75[sm75>78])/75

#[1] 0.1866667

However, this only gives me only one iteration, I want 1000 iterations and then take the average of those 1000 probabilities. I believe some kind of control structure using for loop can be implemented. Also, is there an easier way through "apply" family of functions?

Answer Source

At the end of the day you are testing whether at least 22 students score higher than 78, which can be compactly computed with:

```
sum(rnorm(75, 70, 10) > 78) >= 22
```

Breaking this down a bit, `rnorm(75, 70, 10)`

returns the 75 scores, which are normally distributed with mean 70 and standard deviation 10. `rnorm(75, 70, 10) > 78`

is a vector of length 75 that indicates whether or not each of these scores is above 78. `sum(rnorm(75, 70, 10) > 78)`

converts each true to a 1 and each false to a 0 and sums these values up, meaning it counts the number of the 75 scores that exceed 78. Lastly we test whether the sum is 22 or higher with the full expression above.

`replicate`

can be used to replicate this any number of times. So to see the breakdown of 1000 simulations, you can use the following 1-liner (after setting your random seed, of course):

```
set.seed(144)
table(replicate(1000, sum(rnorm(75, 70, 10) > 78) >= 22))
# FALSE TRUE
# 936 64
```

In 64 of the replicates, at least 22 students scored above a 78, so we estimate the probability to be 6.4%.