Wilbeibi Wilbeibi - 2 months ago 6
Python Question

Find all-zero columns in pandas sparse matrinx

For example I have a coo_matrix A :

import scipy.sparse as sp
A = sp.coo_matrix([3,0,3,0],
[0,0,2,0],
[2,5,1,0],
[0,0,0,0])


How can I get result [0,0,0,1], which indicates that first 3 columns contain non-zero values, only the 4th column is all zeros.

PS : cannot convert A to other type.

PS2 : I tried using
np.nonzeros
but it seems that my implementation is not very elegant.

Answer

We could do something like this -

# Get the columns indices of the input sparse matrix
C = sp.find(A)[1]

# Use np.in1d to create a mask of non-zero columns. 
# So, we invert it and convert to int dtype for desired output.
out = (~np.in1d(np.arange(A.shape[1]),C)).astype(int)

Alternatively, to make the code shorter, we can use subtraction -

out = 1-np.in1d(np.arange(A.shape[1]),C)

Step-by-step run

1) Input array and sparse matrix from it :

In [137]: arr             # Regular dense array
Out[137]: 
array([[3, 0, 3, 0],
       [0, 0, 2, 0],
       [2, 5, 1, 0],
       [0, 0, 0, 0]])

In [138]: A = sp.coo_matrix(arr) # Convert to sparse matrix as input here on

2) Get non-zero column indices :

In [139]: C = sp.find(A)[1]

In [140]: C
Out[140]: array([0, 2, 2, 0, 1, 2], dtype=int32)

3) Use np.in1d to get mask of non-zero columns :

In [141]: np.in1d(np.arange(A.shape[1]),C)
Out[141]: array([ True,  True,  True, False], dtype=bool)

4) Invert it :

In [142]: ~np.in1d(np.arange(A.shape[1]),C)
Out[142]: array([False, False, False,  True], dtype=bool)

5) Finally convert to int dtype :

In [143]: (~np.in1d(np.arange(A.shape[1]),C)).astype(int)
Out[143]: array([0, 0, 0, 1])

Alternative subtraction approach :

In [145]: 1-np.in1d(np.arange(A.shape[1]),C)
Out[145]: array([0, 0, 0, 1])