murban - 6 months ago 41

Python Question

In a simple vector matrix multiplication I get different results/output formats when using a scipy.sparse matrix instead of a dense matrix. As an example I use the following dense matrix and vector:

`import numpy as np`

from scipy import sparse

mat = np.array([[1, 1, 0, 0, 0], [0, 2, 2, 0, 0], [0, 0, 3, 3, 0], [0, 0, 0, 4, 4]])

vec = np.arange(1, 5)

For the vector matrix product I get the following expected output:

`vec.dot(mat) # array([ 1, 5, 13, 25, 16])`

mat.T.dot(vec) # array([ 1, 5, 13, 25, 16])

mat.T.dot(vec.T) # array([ 1, 5, 13, 25, 16])

I accept that it does not play a role if the vector is transposed or not. But when I replace the matrix

`mat`

`mat_sparse`

`[1x mat_sparse, 2x mat_sparse, ...]`

`mat_sparse = sparse.lil_matrix(mat)`

vec.dot(mat_sparse) # array([ <4x5 sparse matrix of type '<type 'numpy.int64'>' with 8 stored elements in LInked List format>, ...], dtype=object)

Using the transposed matrix trick I obtain the expected result:

`mat_sparse.T.dot(vec4.T) # array([ 1, 5, 13, 25, 16])`

Can someone explain why this behaviour is expected/wanted? Replacing the matrix

`mat`

`np.matrix(mat`

Answer

As a general rule don't count on numpy functions and methods to work right with sparse matrices. It is better to use the sparse methods and functions. Regular numpy code does not know anything about sparse matrices.

With a matrix (sparse or np.matrix), `*`

is matrix multiplication.

```
In [2150]: vec*smat # smat=csr_matrix(mat)
Out[2150]: array([ 1, 5, 13, 25, 16], dtype=int32)
```

In this context the sparse matrix definition of the `*`

takes precedence.

```
In [2151]: vec.dot(smat)
Out[2151]:...
array([ <4x5 sparse matrix of type '<class 'numpy.int32'>'
with 8 stored elements in Compressed Sparse Row format>,
...
with 8 stored elements in Compressed Sparse Row format>], dtype=object)
```

In this expression, `vec.dot`

does not know anything about the sparse matrix. Off hand it looks like it is performing the `dot`

separately with each row of `smat`

, but I'd have to dig further.

The following works because it uses a sparse definition of `dot`

, the same as its `*`

:

```
In [2163]: smat.T.dot(vec)
Out[2163]: array([ 1, 5, 13, 25, 16], dtype=int32)
```

`np.dot`

has a limited understanding of sparse matrices. For example it works if both arguments are sparse. `np.dot(smat, smat.T)`

works (same as `np.dot(mat, mat.T)`

)

```
In [2177]: np.dot(smat.T,sparse.csr_matrix(vec).T).A
Out[2177]:
array([[ 1],
[ 5],
[13],
[25],
[16]], dtype=int32)
```

It may help to read up on how sparse matrices are created and store their data. They are not subclasses of `np.ndarray`

.