Wilbeibi - 1 year ago 139
Python Question

# Find all-zero columns in pandas sparse matrix

For example I have a coo_matrix A :

``````import scipy.sparse as sp
A = sp.coo_matrix([3,0,3,0],
[0,0,2,0],
[2,5,1,0],
[0,0,0,0])
``````

How can I get result [0,0,0,1], which indicates that first 3 columns contain non-zero values, only the 4th column is all zeros.

PS : cannot convert A to other type.

PS2 : I tried using
`np.nonzeros`
but it seems that my implementation is not very elegant.

Approach #1 We could do something like this -

``````# Get the columns indices of the input sparse matrix
C = sp.find(A)[1]

# Use np.in1d to create a mask of non-zero columns.
# So, we invert it and convert to int dtype for desired output.
out = (~np.in1d(np.arange(A.shape[1]),C)).astype(int)
``````

Alternatively, to make the code shorter, we can use subtraction -

``````out = 1-np.in1d(np.arange(A.shape[1]),C)
``````

Step-by-step run -

1) Input array and sparse matrix from it :

``````In [137]: arr             # Regular dense array
Out[137]:
array([[3, 0, 3, 0],
[0, 0, 2, 0],
[2, 5, 1, 0],
[0, 0, 0, 0]])

In [138]: A = sp.coo_matrix(arr) # Convert to sparse matrix as input here on
``````

2) Get non-zero column indices :

``````In [139]: C = sp.find(A)[1]

In [140]: C
Out[140]: array([0, 2, 2, 0, 1, 2], dtype=int32)
``````

3) Use `np.in1d` to get mask of non-zero columns :

``````In [141]: np.in1d(np.arange(A.shape[1]),C)
Out[141]: array([ True,  True,  True, False], dtype=bool)
``````

4) Invert it :

``````In [142]: ~np.in1d(np.arange(A.shape[1]),C)
Out[142]: array([False, False, False,  True], dtype=bool)
``````

5) Finally convert to int dtype :

``````In [143]: (~np.in1d(np.arange(A.shape[1]),C)).astype(int)
Out[143]: array([0, 0, 0, 1])
``````

Alternative subtraction approach :

``````In [145]: 1-np.in1d(np.arange(A.shape[1]),C)
Out[145]: array([0, 0, 0, 1])
``````

Approach #2 Here's another way and possibly a faster one using `matrix-multiplication` -

``````out = 1-np.ones(A.shape[0],dtype=bool)*A.astype(bool)
``````

Runtime test

``````In [166]: arr = np.random.randint(0,3,(1000,1000))

In [167]: A = sp.coo_matrix(arr)

In [168]: %timeit 1-np.in1d(np.arange(A.shape[1]),sp.find(A)[1])
10 loops, best of 3: 47.3 ms per loop

In [169]: %timeit 1-np.ones(A.shape[0],dtype=bool)*A.astype(bool)
100 loops, best of 3: 12.9 ms per loop
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download