rocknrollnerd - 1 year ago 136

Python Question

I'm trying to compute a simple dot product but leave nonzero values from the original matrix unchanged. A toy example:

`import numpy as np`

A = np.array([[2, 1, 1, 2],

[0, 2, 1, 0],

[1, 0, 1, 1],

[2, 2, 1, 0]])

B = np.array([[ 0.54331039, 0.41018682, 0.1582158 , 0.3486124 ],

[ 0.68804647, 0.29520239, 0.40654206, 0.20473451],

[ 0.69857579, 0.38958572, 0.30361365, 0.32256483],

[ 0.46195299, 0.79863505, 0.22431876, 0.59054473]])

Desired outcome:

`C = np.array([[ 2. , 1. , 1. , 2. ],`

[ 2.07466874, 2. , 1. , 0.73203386],

[ 1. , 1.5984076 , 1. , 1. ],

[ 2. , 2. , 1. , 1.42925865]])

The actual matrices in question, however, are sparse and look more like this:

`A = sparse.rand(250000, 1700, density=0.001, format='csr')`

B = sparse.rand(1700, 1700, density=0.02, format='csr')

One simple way go would be just setting the values using mask index, like that:

`mask = A != 0`

C = A.dot(B)

C[mask] = A[mask]

However, my original arrays are sparse and quite large, so changing them via index assignment is painfully slow. Conversion to lil matrix helps a bit, but again, conversion itself takes a lot of time.

The other obvious approach, I guess, would be just resort to iteration and skip masked values, but I'd like not to throw away the benefits of numpy/scipy-optimized array multiplication.

`B`

`A`

`C`

`from scipy import sparse`

import numpy as np

def naive(A, B):

mask = A != 0

out = A.dot(B).tolil()

out[mask] = A[mask]

return out.tocsr()

def proposed(A, B):

Az = A == 0

R, C = np.where(Az)

out = A.copy()

out[Az] = np.einsum('ij,ji->i', A[R], B[:, C])

return out

%timeit naive(A, B)

1 loops, best of 3: 4.04 s per loop

%timeit proposed(A, B)

/usr/local/lib/python2.7/dist-packages/scipy/sparse/compressed.py:215: SparseEfficiencyWarning: Comparing a sparse matrix with 0 using == is inefficient, try using != instead.

/usr/local/lib/python2.7/dist-packages/scipy/sparse/coo.pyc in __init__(self, arg1, shape, dtype, copy)

173 self.shape = M.shape

174

--> 175 self.row, self.col = M.nonzero()

176 self.data = M[self.row, self.col]

177 self.has_canonical_format = True

MemoryError:

Couldn't make anything more or less useful out of Cython, at least without going too far away from Python. The idea was to leave the dot product to scipy and just try to set those original values as fast as possible, something like this:

`cimport cython`

@cython.cdivision(True)

@cython.boundscheck(False)

@cython.wraparound(False)

cpdef coo_replace(int [:] row1, int [:] col1, float [:] data1, int[:] row2, int[:] col2, float[:] data2):

cdef int N = row1.shape[0]

cdef int M = row2.shape[0]

cdef int i, j

cdef dict d = {}

for i in range(M):

d[(row2[i], col2[i])] = data2[i]

for j in range(N):

if (row1[j], col1[j]) in d:

data1[j] = d[(row1[j], col1[j])]

This was a bit better then my pre-first "naive" implementation (using

`.tolil()`

Answer Source

A possibly cleaner and faster version of your `naive`

code:

```
In [57]: r,c=A.nonzero() # this uses A.tocoo()
In [58]: C=A*B
In [59]: Cl=C.tolil()
In [60]: Cl[r,c]=A.tolil()[r,c]
In [61]: Cl.tocsr()
```

`C[r,c]=A[r,c]`

gives an efficiency warning, but I think that's aimed more a people do that kind of assignment in loop.

```
In [63]: %%timeit C=A*B
...: C[r,c]=A[r,c]
...
The slowest run took 7.32 times longer than the fastest....
1000 loops, best of 3: 334 µs per loop
In [64]: %%timeit C=A*B
...: Cl=C.tolil()
...: Cl[r,c]=A.tolil()[r,c]
...: Cl.tocsr()
...:
100 loops, best of 3: 2.83 ms per loop
```

My `A`

is small, only (250,100), but it looks like the round trip to `lil`

isn't a time saver, despite the warning.

Masking with `A==0`

is bound to give problems when `A`

is sparse

```
In [66]: Az=A==0
....SparseEfficiencyWarning...
In [67]: r1,c1=Az.nonzero()
```

Compared to the `nonzero`

`r`

for `A`

, this `r1`

is much larger - the row index of all zeros in the sparse matrix; everything but the 25 nonzeros.

```
In [70]: r.shape
Out[70]: (25,)
In [71]: r1.shape
Out[71]: (24975,)
```

If I index `A`

with that `r1`

I get a much larger array. In effect I am repeating each row by the number of zeros in it

```
In [72]: A[r1,:]
Out[72]:
<24975x100 sparse matrix of type '<class 'numpy.float64'>'
with 2473 stored elements in Compressed Sparse Row format>
In [73]: A
Out[73]:
<250x100 sparse matrix of type '<class 'numpy.float64'>'
with 25 stored elements in Compressed Sparse Row format>
```

I've increased the shape and number of nonzero elements by roughly 100 (the number of columns).

Defining `foo`

, and copying Divakar's tests:

```
def foo(A,B):
r,c = A.nonzero()
C = A*B
C[r,c] = A[r,c]
return C
In [83]: timeit naive(A,B)
100 loops, best of 3: 2.53 ms per loop
In [84]: timeit proposed(A,B)
/...
SparseEfficiencyWarning)
100 loops, best of 3: 4.48 ms per loop
In [85]: timeit foo(A,B)
...
SparseEfficiencyWarning)
100 loops, best of 3: 2.13 ms per loop
```

So my version has a modest speed inprovement. As Divakar found out, changing sparsity changes the relative advantages. I expect size to also change them.

The fact that `A.nonzero`

uses the `coo`

format, suggests it might be feasible to construct the new array with that format. A lot of `sparse`

code builds a new matrix via the `coo`

values.

```
In [97]: Co=C.tocoo()
In [98]: Ao=A.tocoo()
In [99]: r=np.concatenate((Co.row,Ao.row))
In [100]: c=np.concatenate((Co.col,Ao.col))
In [101]: d=np.concatenate((Co.data,Ao.data))
In [102]: r.shape
Out[102]: (79,)
In [103]: C1=sparse.csr_matrix((d,(r,c)),shape=A.shape)
In [104]: C1
Out[104]:
<250x100 sparse matrix of type '<class 'numpy.float64'>'
with 78 stored elements in Compressed Sparse Row format>
```

This `C1`

has, I think, the same non-zero elements as the `C`

constructed by other means. But I think one value is different because the `r`

is longer. In this particular example, `C`

and `A`

share one nonzero element, and the `coo`

style of input sums those, where as we'd prefer to have `A`

values overwrite everything.

If you can tolerate this discrepancy, this is a faster way (at least for this test case):

```
def bar(A,B):
C=A*B
Co=C.tocoo()
Ao=A.tocoo()
r=np.concatenate((Co.row,Ao.row))
c=np.concatenate((Co.col,Ao.col))
d=np.concatenate((Co.data,Ao.data))
return sparse.csr_matrix((d,(r,c)),shape=A.shape)
In [107]: timeit bar(A,B)
1000 loops, best of 3: 1.03 ms per loop
```