Ben - 2 months ago 11

R Question

Here's what I tried, making use of the mvtnorm package

`library(mvtnorm)`

set.seed(2357)

df <- data.frame(

x = rnorm(1000, mean=80, sd=20),

y = rnorm(1000, mean=0, sd=5),

z = rnorm(1000, mean=0, sd=5)

)

head(df)

x y z

1 70.38 1.307 0.2005

2 59.76 5.781 -3.5095

3 54.14 -1.313 -1.9022

4 79.91 7.754 -6.2076

5 87.07 1.389 1.1065

6 75.89 1.684 6.2979

`# Get the dimension means and correlation matrix`

means <- c(x=mean(df$x), y=mean(df$y), z=mean(df$z))

corr <- cor(df)

# Check P(x <= 80)

sum(df$x <= 80)/nrow(df) # 0.498

pmvnorm(lower=-Inf, upper=c(80, Inf, Inf), mean=means, corr=corr) # 0.8232

Why is the fitted result 0.82? Where did I go wrong?

Answer

First, you don't need to simulate anything to study the `pmvnorm`

function:

```
pmvnorm(lower=rep(-Inf, 3), upper=c(80, Inf, Inf), mean=c(80,0,0), corr=diag(rep(1,3)))
```

The result is `0.5`

, as you expected.

Your means vector is approximately `(79, 0, 0)`

, so let's try it:

```
pmvnorm(lower=rep(-Inf, 3), upper=c(80, Inf, Inf), mean=c(79,0,0), corr=diag(rep(1,3)))
```

The result now is `0.8413447`

. *There's nothing the matter.* By specifying only the correlation matrix, you told the software to assume that *all variances were unity.* In your simulation, the variances were 400, 25, and 25: very different from what you specified in the arguments!

**The correct calculation uses the covariance matrix of the data, not its correlation matrix:**

```
pmvnorm(lower=rep(-Inf, 3), upper=c(80, Inf, Inf), mean=means, sigma=cov(df))
```

The result is `0.5178412`

, quite in keeping with the data.