tzi - 1 year ago 69

R Question

I would like to generate *k* variables from a multivariate normal distribution with a pre-specified mean, standard deviation and fixed correlation across the

`k`

I tried to do the following:

`set.seed(10)`

library(MASS)

k=10 #number of variables

mu <- rep(1,k) #mean of each variable

nobs <- 10000 #number of observations

sd <- rep(c(1,5),each=5) #standard deviation of each variable

cor <- 0.9 #correlation across variables

M <- matrix(cor,nrow=k,ncol=k) #variance covariance matrix

diag(M) <- sd^2 #desired standard deviations

data <- mvrnorm(nobs,mu,Sigma=M) #generate data

My problem is that I get the desired means and standard deviations but the correlation is far from the desired value.

`mean(cor(data))`

[1] 0.3774926

I guess imposing specific standard deviations restricts the possible correlations I can obtain.

Is this indeed the case?

If so is there any way to get closer to the desired correlations?

Recommended for you: Get network issues from **WhatsUp Gold**. **Not end users.**

Answer Source

Try this (generate the covariance matrix as per your requirement, by definition, `cov(x,y)=cor(x,y)*sigma_x*sigma_y)`

:

```
M <- matrix(cor,nrow=k,ncol=k)*outer(sd,sd) # covariance matrix
diag(M) <- sd^2 #desired standard deviations
data <- mvrnorm(nobs,mu,Sigma=M) #generate data
mean(cor(data))
#[1] 0.9102620391642199
```

Recommended from our users: **Dynamic Network Monitoring from WhatsUp Gold from IPSwitch**. ** Free Download**