tzi tzi - 1 month ago 7
R Question

Generate correlated variables with fixed correlation but varying standard deviation

I would like to generate k variables from a multivariate normal distribution with a pre-specified mean, standard deviation and fixed correlation across the

k
variables.

I tried to do the following:

set.seed(10)
library(MASS)

k=10 #number of variables
mu <- rep(1,k) #mean of each variable
nobs <- 10000 #number of observations
sd <- rep(c(1,5),each=5) #standard deviation of each variable
cor <- 0.9 #correlation across variables

M <- matrix(cor,nrow=k,ncol=k) #variance covariance matrix
diag(M) <- sd^2 #desired standard deviations

data <- mvrnorm(nobs,mu,Sigma=M) #generate data


My problem is that I get the desired means and standard deviations but the correlation is far from the desired value.

mean(cor(data))
[1] 0.3774926


I guess imposing specific standard deviations restricts the possible correlations I can obtain.

Is this indeed the case?

If so is there any way to get closer to the desired correlations?

Answer

Try this (generate the covariance matrix as per your requirement, by definition, cov(x,y)=cor(x,y)*sigma_x*sigma_y):

M <- matrix(cor,nrow=k,ncol=k)*outer(sd,sd) # covariance matrix
diag(M) <- sd^2 #desired standard deviations

data <- mvrnorm(nobs,mu,Sigma=M) #generate data
mean(cor(data))
#[1] 0.9102620391642199