dayum - 6 months ago 48

Python Question

I have a 3D numpy array of shape

`(t,n1,n2)`

`x=np.random.rand(10,2,4)`

I need to calculate another

`3D`

`y`

`(t,n1,n1)`

`y[0] = np.cov[x[0,:,:])`

So, a loopy implementation would be -

`y=np.zeros((10,2,2))`

for i in np.arange(x.shape[0]):

y[i]=np.cov(x[i,:,:])

Is there any way to vectorize this so I can calculate all covariance matrices in one go? I tried doing :

`x1= x.swapaxes(1,2)`

y= np.dot(x,x1)

But it didn't work.

Answer

Hacked into `numpy.cov source code`

and tried using the default parameters. As it turns out, `np.cov(x[i,:,:])`

would be simply :

```
N = x.shape[2]
m = x[i,:,:]
m -= np.sum(m, axis=1, keepdims=True) / N
cov = np.dot(m, m.T) /(N - 1)
```

So, the task was to vectorize this loop that would iterate through `i`

and process all of the data from `x`

in one go. For the same, we could use `broadcasting`

at the third step. For the final step, we are performing `sum-reduction`

there along all slices in first axis. This could be efficiently implemented in a vectorized manner with `np.einsum`

. Thus, the final implementation came to this -

```
N = x.shape[2]
m1 = x - x.sum(2,keepdims=1)/N
y_out = np.einsum('ijk,ilk->ijl',m1,m1) /(N - 1)
```

**Runtime test**

```
In [155]: def original_app(x):
...: n = x.shape[0]
...: y = np.zeros((n,2,2))
...: for i in np.arange(x.shape[0]):
...: y[i]=np.cov(x[i,:,:])
...: return y
...:
...: def proposed_app(x):
...: N = x.shape[2]
...: m1 = x - x.sum(2,keepdims=1)/N
...: out = np.einsum('ijk,ilk->ijl',m1,m1) / (N - 1)
...: return out
...:
In [156]: # Setup inputs
...: n = 10000
...: x = np.random.rand(n,2,4)
...:
In [157]: np.allclose(original_app(x),proposed_app(x))
Out[157]: True # Results verified
In [158]: %timeit original_app(x)
1 loops, best of 3: 610 ms per loop
In [159]: %timeit proposed_app(x)
100 loops, best of 3: 6.32 ms per loop
```

Huge speedup there!