1a1a11a - 4 months ago 30

Python Question

I am working on a homework assignment and I noticed that when the dimension of mean and covariance is very high,

`multivariate_normal`

An example code snippet,

`cov_true = np.eye(p)`

mean_true = np.zeros(p)

beta_true = multivariate_normal(mean_true, cov_true, size=1).T

when

`p=5000`

environment, python3.4 and python3.5, numpy 1.11.0

Is it really a bug or did I miss something?

Answer

**What takes so much time?**

To account for relations between variables NumPy computes the singular value decomposition of your covariance matrix and this takes the majority of the time (the underlying GESDD is in general Θ(n^{3}), and 5000^{3} is already a bit).

**How can things be sped up?**

In the simplest case with all variables independent, you could just use `random.normal`

:

```
from numpy.random import normal
sample = normal(means, deviations, len(means))
```

Otherwise, if your covariance matrix happens to be full rank (hence positive-definite), supplant `svd`

with `cholesky`

(still Θ(n^{3}) in general, but with a smaller constant):

```
from numpy.random import standard_normal
from scipy.linalg import cholesky
l = cholesky(covariances, check_finite=False, overwrite_a=True)
sample = means + l.dot(standard_normal(len(means)))
```

If the matrix may be singular (as is sometimes the case), then either wrap SPSTRF or consider helping with scipy#6202.

Cholesky will likely be noticeably faster, but if that's not sufficient, then further you could research if if it wouldn't be possible to decompose the matrix analytically, or try using a different base library (such as ACML, MKL, or cuSOLVER).