tzi - 9 months ago 33

R Question

I would like to generate *k* variables from a multivariate normal distribution with a pre-specified mean, standard deviation and fixed correlation across the

`k`

I tried to do the following:

`set.seed(10)`

library(MASS)

k=10 #number of variables

mu <- rep(1,k) #mean of each variable

nobs <- 10000 #number of observations

sd <- rep(c(1,5),each=5) #standard deviation of each variable

cor <- 0.9 #correlation across variables

M <- matrix(cor,nrow=k,ncol=k) #variance covariance matrix

diag(M) <- sd^2 #desired standard deviations

data <- mvrnorm(nobs,mu,Sigma=M) #generate data

My problem is that I get the desired means and standard deviations but the correlation is far from the desired value.

`mean(cor(data))`

[1] 0.3774926

I guess imposing specific standard deviations restricts the possible correlations I can obtain.

Is this indeed the case?

If so is there any way to get closer to the desired correlations?

Answer

Try this (generate the covariance matrix as per your requirement, by definition, `cov(x,y)=cor(x,y)*sigma_x*sigma_y)`

:

```
M <- matrix(cor,nrow=k,ncol=k)*outer(sd,sd) # covariance matrix
diag(M) <- sd^2 #desired standard deviations
data <- mvrnorm(nobs,mu,Sigma=M) #generate data
mean(cor(data))
#[1] 0.9102620391642199
```

Source (Stackoverflow)