I am working on a homework assignment and I noticed that when the dimension of mean and covariance is very high,
cov_true = np.eye(p)
mean_true = np.zeros(p)
beta_true = multivariate_normal(mean_true, cov_true, size=1).T
What takes so much time?
To account for relations between variables NumPy computes the singular value decomposition of your covariance matrix and this takes the majority of the time (the underlying GESDD is in general Θ(n3), and 50003 is already a bit).
How can things be sped up?
In the simplest case with all variables independent, you could just use
from numpy.random import normal sample = normal(means, deviations, len(means))
Otherwise, if your covariance matrix happens to be full rank (hence positive-definite), supplant
cholesky (still Θ(n3) in general, but with a smaller constant):
from numpy.random import standard_normal from scipy.linalg import cholesky l = cholesky(covariances, check_finite=False, overwrite_a=True) sample = means + l.dot(standard_normal(len(means)))
Cholesky will likely be noticeably faster, but if that's not sufficient, then further you could research if if it wouldn't be possible to decompose the matrix analytically, or try using a different base library (such as ACML, MKL, or cuSOLVER).