Martin Martin - 9 days ago 5
Python Question

Multiplying column elements of sparse Matrix

I have a sparse csc matrix with many zero elements for which I would like to compute the product of all column elements for each row.

i.e.:

A = [[1,2,0,0],
[2,0,3,0]]


should be converted to:

V = [[2,
6]]


Using a numpy dense matrix this can be accomplished by replacing all zero values with one values and using
A.prod(1)
. This is however not a option since the dense matrix would be too large.

Is there any way to accomplish this without converting the sparse matrix into a dense one?

Answer

Approach #1: We can use the row indices of the sparse elements as IDs and perform multiplication of the corresponding values of those elements with np.multiply.reduceat to get the desired output.

Thus, an implementation would be -

from scipy import sparse
from scipy.sparse import csc_matrix

r,c,v = sparse.find(a) # a is input sparse matrix
out = np.zeros(a.shape[0],dtype=a.dtype)
unqr, shift_idx = np.unique(r,return_index=1)
out[unqr] = np.multiply.reduceat(v, shift_idx)

Sample run -

In [89]: # Let's create a sample csc_matrix
    ...: A = np.array([[-1,2,0,0],[0,0,0,0],[2,0,3,0],[4,5,6,0],[1,9,0,2]])
    ...: a = csc_matrix(A)
    ...: 

In [90]: a
Out[90]: 
<5x4 sparse matrix of type '<type 'numpy.int64'>'
    with 10 stored elements in Compressed Sparse Column format>

In [91]: a.toarray()
Out[91]: 
array([[-1,  2,  0,  0],
       [ 0,  0,  0,  0],
       [ 2,  0,  3,  0],
       [ 4,  5,  6,  0],
       [ 1,  9,  0,  2]])

In [92]: out
Out[92]: array([ -2,   0,   6, 120,   0,  18])

Approach #2: We are performing bin-based multiplication. We have bin-based summing solution with np.bincount. So, a trick that could be use here would be converting the numbers to logarithmic numbers, perform bin-based summing and then convert back to original format with exponential (reverse of log) and that's it! For negative numbers, we might to add a step or more, but let's see what the implementation be like for non-negative numbers -

r,c,v = sparse.find(a)
out = np.exp(np.bincount(r,np.log(v),minlength = a.shape[0]))
out[np.setdiff1d(np.arange(a.shape[0]),r)] = 0

A sample run with non-negative numbers -

In [118]: a.toarray()
Out[118]: 
array([[1, 2, 0, 0],
       [0, 0, 0, 0],
       [2, 0, 3, 0],
       [4, 5, 6, 0],
       [1, 9, 0, 2]])

In [120]: out  # Using listed code
Out[120]: array([   2.,    0.,    6.,  120.,   18.])