Raphael - 1 year ago 309
Python Question

# Python - Efficient Function with scipy sparse Matrices

for a project, I need an efficient function in python that solves to following task:

Given a very large List X of long sparse Vectors (=> big sparse Matrix) and another Matrix Y that contains a single Vector y, I want a List of "distances", that y has to every Element of X. Hereby the "distance" is defined like this:

Compare each Element of the two Vectors, always take the lower one and sum them up.

Example:

``````X = [[0,0,2],
[1,0,0],
[3,1,0]]

Y = [[1,0,2]]
``````

The function should return dist = [2,1,1]

In my project, both X and Y contain a lot of zeros and come in as an instance of:

``````<class 'scipy.sparse.csr.csr_matrix'>
``````

So far so good and I managed to write a functions that solves this task, but is very slow and horrible inefficient. I need some tips on how to efficienty process/iterate the sparse Matrices.
This is my function:

``````def get_distances(X, Y):
Ret=[]
rows, cols = X.shape

for i in range(0,rows):
dist = 0
sample = X.getrow(i).todense()
test = Y.getrow(0).todense()
rows_s, cols_s = sample.shape
rows_t, cols_t = test.shape

for s,t in zip(range(0, cols_s), range(0, cols_t)):
dist += min(sample[0,s], test[0,t])

X_ret.append([dist])

return ret
``````

To do my Operations, I convert the sparse matrices to dense matrices which is of course horrible, but I did not know how to do it better. Do you know how to improve my code and make the function faster?

Thank you a lot!

I revised your function and ran it in

``````import numpy as np
from scipy import sparse

def get_distances(X, Y):
ret=[]
for row in X:
sample = row.A
test = Y.getrow(0).A
dist = np.minimum(sample[0,:], test[0,:]).sum()
ret.append(dist)
return ret

X = [[0,0,2],
[1,0,0],
[3,1,0]]

Y = [[1,0,2]]

XM = sparse.csr_matrix(X)
YM = sparse.csr_matrix(Y)

print( get_distances(XM,YM))

print (np.minimum(XM.A, YM.A).sum(axis=1))
``````

producing

``````1255:~/mypy\$ python3 stack37056258.py
[2, 1, 1]
[2 1 1]
``````

`np.minimum` takes element wise minimum of two arrays (may be 2d), so I don't need to iterate on columns. I also don't need to use indexing.

`minimum` is also implemented for sparse matrices, but I get a segmenation error when I try to apply it to your `X` (with 3 rows) and `Y` (with 1). If they are the same size this works:

``````Ys = sparse.vstack((YM,YM,YM))
print(Ys.shape)
print (XM.minimum(Ys).sum(axis=1))
``````

Converting the single row matrix to an array also gets around the error - because it ends up using the dense version, `np.minimum(XM.todense(), YM.A)`.

``````print (XM.minimum(YM.A).sum(axis=1))
``````

When I try other element by element operations on these 2 matrices I get `ValueError: inconsistent shapes`, e.g. `XM+YM`, or `XM<YM`. Looks like sparse does not implement broadcasting as `numpy` arrays does.

=======================

There's an overlap in issues with Scipy sparse matrix alternative for getrow()

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download