malisit - 10 months ago 70

Python Question

I'm porting a MATLAB code in Python with numpy and scipy and I need to use numpy/scipy equivalent of the sparse function in MATLAB.

Here's the usage of the sparse function in MATLAB,

`sparse([3; 2], [2; 4], [3; 0])`

I have these, but they don't work,

`sps.csr_matrix([3, 2], [2, 4], [3, 0])`

sps.csr_matrix(np.array([[3], [2]]), np.array([[2], [4]]), np.array([[3], [0]]))

sps.csr_matrix([[3], [2]], [[2], [4]], [[3], [0]])

Any ideas?

Thanks.

Edit:

Here's the output of the MATLAB version, it doesn't seem to be like the one that scipy gave,

`Trial>> m = sparse([3; 2], [2; 4], [3; 0])`

m =

(3,2) 3

Trial>> full(m)

ans =

0 0 0 0

0 0 0 0

0 3 0 0

Answer Source

You're using the `sparse(I, J, SV)`

form [note: link goes to documentation for GNU Octave, not Matlab]. The `scipy.sparse`

equivalent is `csr_matrix((SV, (I, J)))`

-- yes, a single argument which is a 2-tuple containing a vector and a 2-tuple of vectors. You also have to correct the index vectors because Python consistently uses 0-based indexing.

```
>>> m = sps.csr_matrix(([3,0], ([2,1], [1,3]))); m
<3x4 sparse matrix of type '<class 'numpy.int64'>'
with 2 stored elements in Compressed Sparse Row format>
>>> m.todense()
matrix([[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 3, 0, 0]], dtype=int64)
```

Note that scipy, unlike Matlab, does not automatically discard explicit zeroes, and will use integer storage for matrices containing only integers. To perfectly match the matrix you got in Matlab, you must explicitly ask for floating-point storage and you must call `eliminate_zeros()`

on the result:

```
>>> m2 = sps.csr_matrix(([3,0], ([2,1], [1,3])), dtype=np.float)
>>> m2.eliminate_zeros()
>>> m2
<3x4 sparse matrix of type '<class 'numpy.float64'>'
with 1 stored elements in Compressed Sparse Row format>
>>> m2.todense()
matrix([[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.],
[ 0., 3., 0., 0.]])
```

You could also change `[3,0]`

to `[3., 0.]`

but I recommend an explicit `dtype=`

argument because that will prevent surprises when you are feeding in real data.

(I don't know what Matlab's internal sparse matrix representation is, but Octave appears to default to compressed sparse *column* representation. The difference between CSC and CSR should only affect performance. If your NumPy code winds up being slower than your Matlab code, try using `sps.csc_matrix`

instead of `csr_matrix`

, as well as all the usual NumPy performance tips.)

(You probably need to read NumPy for Matlab users if you haven't already.)