I want to simulate the problem below in R and calculate the average probability based on 1000 simulations -
Scores on a test are normally distributed with mean 70 and std dev 10.
Estimate the probability that among 75 randomly selected students at least 22 score greater than 78
This is what I have done so far
scores = rnorm(1000,70,10)
At the end of the day you are testing whether at least 22 students score higher than 78, which can be compactly computed with:
sum(rnorm(75, 70, 10) > 78) >= 22
Breaking this down a bit,
rnorm(75, 70, 10) returns the 75 scores, which are normally distributed with mean 70 and standard deviation 10.
rnorm(75, 70, 10) > 78 is a vector of length 75 that indicates whether or not each of these scores is above 78.
sum(rnorm(75, 70, 10) > 78) converts each true to a 1 and each false to a 0 and sums these values up, meaning it counts the number of the 75 scores that exceed 78. Lastly we test whether the sum is 22 or higher with the full expression above.
replicate can be used to replicate this any number of times. So to see the breakdown of 1000 simulations, you can use the following 1-liner (after setting your random seed, of course):
set.seed(144) table(replicate(1000, sum(rnorm(75, 70, 10) > 78) >= 22)) # FALSE TRUE # 936 64
In 64 of the replicates, at least 22 students scored above a 78, so we estimate the probability to be 6.4%.