I'm trying to do a normalized sum of outer product of a 60000x100 matrix. I would like to do it using numpy way, since my solution is constrained by the python for loop in the list comprehension:
def covariance_over_time(X):
B = np.sum(np.array([np.outer(x, x) for x in X]),axis=0)
B = np.true_divide(B, len(X))
return B
You could simply use matrix-multiplication
using np.dot
-
B = X.T.dot(X)
Then, normalize with np.true_divide(B, len(X))
.
If you still encounter memory error(s), we have two more options/methods.
I. Full loopy solution
We could loop through the second axis (columns) of X
and perform matrix multiplication between each column against every column using two loops. Now, X
has only 100
columns and thus, a full loopy solution would only iterate for 100X100 = 10000
times and at each iteration perform 60000
(no. of rows in X
) sum-reductions.
n = X.shape[1]
out = np.empty((n,n),dtype=X.dtype)
for i in range(n):
for j in range(n):
out[i,j] = X[:,i].dot(X[:,j])
II. Hybrid solution
A comprise between a full loopy solution and a fully vectorized one listed at the start, would be using one loop, which would perform matrix multiplication between each column against the entire array. This would do 60000X100=6000000
sum-reductions at each iteration.
n = X.shape[1]
out = np.empty((n,n),dtype=X.dtype)
for i in range(n):
out[i] = X[:,i].dot(X)