Ben - 1 year ago 106
R Question

# How to calculate multivariate normal distribution function in R

Here's what I tried, making use of the mvtnorm package

# Sample Dataset

``````library(mvtnorm)

set.seed(2357)
df <- data.frame(
x = rnorm(1000, mean=80, sd=20),
y = rnorm(1000, mean=0, sd=5),
z = rnorm(1000, mean=0, sd=5)
)

x      y       z
1 70.38  1.307  0.2005
2 59.76  5.781 -3.5095
3 54.14 -1.313 -1.9022
4 79.91  7.754 -6.2076
5 87.07  1.389  1.1065
6 75.89  1.684  6.2979
``````

# Fit multivariate normal dist and check P(x <= 80) ~ 0.5

``````# Get the dimension means and correlation matrix
means <- c(x=mean(df\$x), y=mean(df\$y), z=mean(df\$z))
corr <- cor(df)

# Check P(x <= 80)
sum(df\$x <= 80)/nrow(df)  # 0.498
pmvnorm(lower=-Inf, upper=c(80, Inf, Inf), mean=means, corr=corr)  # 0.8232
``````

Why is the fitted result 0.82? Where did I go wrong?

First, you don't need to simulate anything to study the `pmvnorm` function:

``````pmvnorm(lower=rep(-Inf, 3), upper=c(80, Inf, Inf), mean=c(80,0,0), corr=diag(rep(1,3)))
``````

The result is `0.5`, as you expected.

Your means vector is approximately `(79, 0, 0)`, so let's try it:

``````pmvnorm(lower=rep(-Inf, 3), upper=c(80, Inf, Inf), mean=c(79,0,0), corr=diag(rep(1,3)))
``````

The result now is `0.8413447`. There's nothing the matter. By specifying only the correlation matrix, you told the software to assume that all variances were unity. In your simulation, the variances were 400, 25, and 25: very different from what you specified in the arguments!

The correct calculation uses the covariance matrix of the data, not its correlation matrix:

``````pmvnorm(lower=rep(-Inf, 3), upper=c(80, Inf, Inf), mean=means, sigma=cov(df))
``````

The result is `0.5178412`, quite in keeping with the data.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download