murban murban - 1 month ago 6
Python Question

Vector Matrix product differences between sparse and dense matrix

In a simple vector matrix multiplication I get different results/output formats when using a scipy.sparse matrix instead of a dense matrix. As an example I use the following dense matrix and vector:

import numpy as np
from scipy import sparse
mat = np.array([[1, 1, 0, 0, 0], [0, 2, 2, 0, 0], [0, 0, 3, 3, 0], [0, 0, 0, 4, 4]])
vec = np.arange(1, 5)


For the vector matrix product I get the following expected output:

vec.dot(mat) # array([ 1, 5, 13, 25, 16])
mat.T.dot(vec) # array([ 1, 5, 13, 25, 16])
mat.T.dot(vec.T) # array([ 1, 5, 13, 25, 16])


I accept that it does not play a role if the vector is transposed or not. But when I replace the matrix
mat
by a sparse matrix
mat_sparse
I obtain as a result an array of sparse 4x5 matrices containing the sparse matrix multiplied by each vector component, i.e.
[1x mat_sparse, 2x mat_sparse, ...]


mat_sparse = sparse.lil_matrix(mat)
vec.dot(mat_sparse) # array([ <4x5 sparse matrix of type '<type 'numpy.int64'>' with 8 stored elements in LInked List format>, ...], dtype=object)


Using the transposed matrix trick I obtain the expected result:

mat_sparse.T.dot(vec4.T) # array([ 1, 5, 13, 25, 16])


Can someone explain why this behaviour is expected/wanted? Replacing the matrix
mat
(which is actually a 2D array) by an instance of
np.matrix(mat
does not change the results.

Answer

As a general rule don't count on numpy functions and methods to work right with sparse matrices. It is better to use the sparse methods and functions. Regular numpy code does not know anything about sparse matrices.

With a matrix (sparse or np.matrix), * is matrix multiplication.

In [2150]: vec*smat    # smat=csr_matrix(mat)
Out[2150]: array([ 1,  5, 13, 25, 16], dtype=int32)

In this context the sparse matrix definition of the * takes precedence.

In [2151]: vec.dot(smat)
Out[2151]:...
array([ <4x5 sparse matrix of type '<class 'numpy.int32'>'
    with 8 stored elements in Compressed Sparse Row format>,
    ...
    with 8 stored elements in Compressed Sparse Row format>], dtype=object)

In this expression, vec.dot does not know anything about the sparse matrix. Off hand it looks like it is performing the dot separately with each row of smat, but I'd have to dig further.

The following works because it uses a sparse definition of dot, the same as its *:

In [2163]: smat.T.dot(vec)
Out[2163]: array([ 1,  5, 13, 25, 16], dtype=int32)

np.dot has a limited understanding of sparse matrices. For example it works if both arguments are sparse. np.dot(smat, smat.T) works (same as np.dot(mat, mat.T))

In [2177]: np.dot(smat.T,sparse.csr_matrix(vec).T).A
Out[2177]: 
array([[ 1],
       [ 5],
       [13],
       [25],
       [16]], dtype=int32)

It may help to read up on how sparse matrices are created and store their data. They are not subclasses of np.ndarray.

Comments