malisit - 2 days ago 6
Python Question

# numpy/scipy equivalent of MATLAB's sparse function

I'm porting a MATLAB code in Python with numpy and scipy and I need to use numpy/scipy equivalent of the sparse function in MATLAB.

Here's the usage of the sparse function in MATLAB,

sparse([3; 2], [2; 4], [3; 0])

I have these, but they don't work,

sps.csr_matrix([3, 2], [2, 4], [3, 0])
sps.csr_matrix(np.array([[3], [2]]), np.array([[2], [4]]), np.array([[3], [0]]))
sps.csr_matrix([[3], [2]], [[2], [4]], [[3], [0]])

Any ideas?
Thanks.

Edit:
Here's the output of the MATLAB version, it doesn't seem to be like the one that scipy gave,

Trial>> m = sparse([3; 2], [2; 4], [3; 0])

m =

(3,2) 3

Trial>> full(m)

ans =

0 0 0 0
0 0 0 0
0 3 0 0

You're using the sparse(I, J, SV) form [note: link goes to documentation for GNU Octave, not Matlab]. The scipy.sparse equivalent is csr_matrix((SV, (I, J))) -- yes, a single argument which is a 2-tuple containing a vector and a 2-tuple of vectors. You also have to correct the index vectors because Python consistently uses 0-based indexing.

>>> m = sps.csr_matrix(([3,0], ([2,1], [1,3]))); m
<3x4 sparse matrix of type '<class 'numpy.int64'>'
with 2 stored elements in Compressed Sparse Row format>

>>> m.todense()
matrix([[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 3, 0, 0]], dtype=int64)

Note that scipy, unlike Matlab, does not automatically discard explicit zeroes, and will use integer storage for matrices containing only integers. To perfectly match the matrix you got in Matlab, you must explicitly ask for floating-point storage and you must call eliminate_zeros() on the result:

>>> m2 = sps.csr_matrix(([3,0], ([2,1], [1,3])), dtype=np.float)
>>> m2.eliminate_zeros()
>>> m2
<3x4 sparse matrix of type '<class 'numpy.float64'>'
with 1 stored elements in Compressed Sparse Row format>
>>> m2.todense()
matrix([[ 0.,  0.,  0.,  0.],
[ 0.,  0.,  0.,  0.],
[ 0.,  3.,  0.,  0.]])

You could also change [3,0] to [3., 0.] but I recommend an explicit dtype= argument because that will prevent surprises when you are feeding in real data.

(I don't know what Matlab's internal sparse matrix representation is, but Octave appears to default to compressed sparse column representation. The difference between CSC and CSR should only affect performance. If your NumPy code winds up being slower than your Matlab code, try using sps.csc_matrix instead of csr_matrix, as well as all the usual NumPy performance tips.)

(You probably need to read NumPy for Matlab users if you haven't already.)