ToneDaBass - 1 year ago 36

Python Question

I need to convert a sparse logic matrix into a list of sets, where each list[i] contains the set of rows with nonzero values for column[i]. The following code works, but I'm wondering if there's a faster way to do this. The actual data I'm using is approx 6000x6000 and much more sparse than this example.

`import numpy as np`

A = np.array([[1, 0, 0, 0, 0, 1],

[0, 1, 1, 1, 1, 0],

[1, 0, 1, 0, 1, 1],

[1, 1, 0, 1, 0, 1],

[1, 1, 0, 1, 0, 0],

[1, 0, 0, 0, 0, 0],

[0, 0, 1, 1, 1, 0],

[0, 0, 1, 0, 1, 0]])

rows,cols = A.shape

C = np.nonzero(A)

D = [set() for j in range(cols)]

for i in range(len(C[0])):

D[C[1][i]].add(C[0][i])

print D

Answer Source

If you represent the sparse array as a `csc_matrix`

, you can use the `indices`

and `indptr`

attributes to create the sets.

For example,

```
In [93]: A
Out[93]:
array([[1, 0, 0, 0, 0, 1],
[0, 1, 1, 1, 1, 0],
[1, 0, 1, 0, 1, 1],
[1, 1, 0, 1, 0, 1],
[1, 1, 0, 1, 0, 0],
[1, 0, 0, 0, 0, 0],
[0, 0, 1, 1, 1, 0],
[0, 0, 1, 0, 1, 0]])
In [94]: from scipy.sparse import csc_matrix
In [95]: C = csc_matrix(A)
In [96]: C.indptr
Out[96]: array([ 0, 5, 8, 12, 16, 20, 23], dtype=int32)
In [97]: C.indices
Out[97]: array([0, 2, 3, 4, 5, 1, 3, 4, 1, 2, 6, 7, 1, 3, 4, 6, 1, 2, 6, 7, 0, 2, 3], dtype=int32)
In [98]: D = [set(C.indices[C.indptr[i]:C.indptr[i+1]]) for i in range(C.shape[1])]
In [99]: D
Out[99]:
[{0, 2, 3, 4, 5},
{1, 3, 4},
{1, 2, 6, 7},
{1, 3, 4, 6},
{1, 2, 6, 7},
{0, 2, 3}]
```

For a list of arrays instead of sets, just don't call `set()`

:

```
In [100]: [C.indices[C.indptr[i]:C.indptr[i+1]] for i in range(len(C.indptr)-1)]
Out[100]:
[array([0, 2, 3, 4, 5], dtype=int32),
array([1, 3, 4], dtype=int32),
array([1, 2, 6, 7], dtype=int32),
array([1, 3, 4, 6], dtype=int32),
array([1, 2, 6, 7], dtype=int32),
array([0, 2, 3], dtype=int32)]
```