murban - 1 month ago 6
Python Question

# Vector Matrix product differences between sparse and dense matrix

In a simple vector matrix multiplication I get different results/output formats when using a scipy.sparse matrix instead of a dense matrix. As an example I use the following dense matrix and vector:

``````import numpy as np
from scipy import sparse
mat = np.array([[1, 1, 0, 0, 0], [0, 2, 2, 0, 0], [0, 0, 3, 3, 0], [0, 0, 0, 4, 4]])
vec = np.arange(1, 5)
``````

For the vector matrix product I get the following expected output:

``````vec.dot(mat)   # array([ 1,  5, 13, 25, 16])
mat.T.dot(vec) # array([ 1,  5, 13, 25, 16])
mat.T.dot(vec.T) # array([ 1,  5, 13, 25, 16])
``````

I accept that it does not play a role if the vector is transposed or not. But when I replace the matrix
`mat`
by a sparse matrix
`mat_sparse`
I obtain as a result an array of sparse 4x5 matrices containing the sparse matrix multiplied by each vector component, i.e.
`[1x mat_sparse, 2x mat_sparse, ...]`

``````mat_sparse = sparse.lil_matrix(mat)
vec.dot(mat_sparse)  # array([ <4x5 sparse matrix of type '<type 'numpy.int64'>' with 8 stored elements in LInked List format>, ...], dtype=object)
``````

Using the transposed matrix trick I obtain the expected result:

``````mat_sparse.T.dot(vec4.T)  # array([ 1,  5, 13, 25, 16])
``````

Can someone explain why this behaviour is expected/wanted? Replacing the matrix
`mat`
(which is actually a 2D array) by an instance of
`np.matrix(mat`
does not change the results.

Answer

As a general rule don't count on numpy functions and methods to work right with sparse matrices. It is better to use the sparse methods and functions. Regular numpy code does not know anything about sparse matrices.

With a matrix (sparse or np.matrix), `*` is matrix multiplication.

``````In [2150]: vec*smat    # smat=csr_matrix(mat)
Out[2150]: array([ 1,  5, 13, 25, 16], dtype=int32)
``````

In this context the sparse matrix definition of the `*` takes precedence.

``````In [2151]: vec.dot(smat)
Out[2151]:...
array([ <4x5 sparse matrix of type '<class 'numpy.int32'>'
with 8 stored elements in Compressed Sparse Row format>,
...
with 8 stored elements in Compressed Sparse Row format>], dtype=object)
``````

In this expression, `vec.dot` does not know anything about the sparse matrix. Off hand it looks like it is performing the `dot` separately with each row of `smat`, but I'd have to dig further.

The following works because it uses a sparse definition of `dot`, the same as its `*`:

``````In [2163]: smat.T.dot(vec)
Out[2163]: array([ 1,  5, 13, 25, 16], dtype=int32)
``````

`np.dot` has a limited understanding of sparse matrices. For example it works if both arguments are sparse. `np.dot(smat, smat.T)` works (same as `np.dot(mat, mat.T)`)

``````In [2177]: np.dot(smat.T,sparse.csr_matrix(vec).T).A
Out[2177]:
array([[ 1],
[ 5],
[13],
[25],
[16]], dtype=int32)
``````

It may help to read up on how sparse matrices are created and store their data. They are not subclasses of `np.ndarray`.

Comments