gzc gzc -4 years ago 249
Python Question

numpy most efficient way to get row index of true values by columns

I want to get row index of true values by column from a 2-D ndarray. By far, I have a solution with a for loop. But I think this isn't efficient because there exits a python native for-loop. I try to figure out a vectorized solution but failed.

Update: It's not necessary to be a vectorized solution, more efficient is better.

arr = np.random.randint(2, size=15).reshape((3,5)).astype(bool)
print arr

[[ True False True False True]
[False True False True True]
[ True True False False True]]

def calc(matrix):
result = []
for i in range(matrix.shape[1]):
result.append(np.argwhere(matrix[:, i]).flatten().tolist())
return result

print calc(arr)
[[0, 2], [1, 2], [0], [1], [0, 1, 2]]


Note: I want row indices grouped by columns. And when a column is all False, I need to get an empty list
[]
instead of skipping.

Answer Source

Approach #1

Here's one vectorized NumPy approach to have those row indices grouped in a list of arrays -

r,c = np.where(arr.T)
out = np.split(c, np.flatnonzero(r[1:] != r[:-1])+1)

Sample run -

In [63]: arr = np.random.randint(2, size=15).reshape((3,5)).astype(bool)

In [64]: arr
Out[64]: 
array([[False, False,  True,  True, False],
       [ True,  True, False, False,  True],
       [ True,  True, False, False,  True]], dtype=bool)

In [65]: r,c = np.where(arr.T)

In [66]: np.split(c, np.flatnonzero(r[1:] != r[:-1])+1)
Out[66]: [array([1, 2]), array([1, 2]), array([0]), array([0]), array([1, 2])]

In [67]: calc(arr)
Out[67]: [[1, 2], [1, 2], [0], [0], [1, 2]]

Approach #2

Alternatively, we could use loop comprehension to avoid that splitting -

idx = np.concatenate(([0], np.flatnonzero(r[1:] != r[:-1])+1, [r.size] ))
out = [c[idx[i]:idx[i+1]] for i in range(len(idx)-1)]

We are using r,c from approach #1.

Approach #3 ( To output empty lists/arrays for all 0s cols

To account for all zeros columns, for which we need empty lists/arrays, here's a modified approach -

idx = np.concatenate(([0], arr.sum(0).cumsum() ))
out = [c[idx[i]:idx[i+1]] for i in range(len(idx)-1)]

We are using c from approach #1.

Sample run -

In [177]: arr
Out[177]: 
array([[ True, False, False, False, False],
       [ True, False, False, False,  True],
       [ True, False,  True, False,  True]], dtype=bool)

In [178]: idx = np.concatenate(([0], arr.sum(0).cumsum() ))
     ...: out = [c[idx[i]:idx[i+1]] for i in range(len(idx)-1)]
     ...: 

In [179]: out
Out[179]: 
[array([0, 1, 2]),
 array([], dtype=int64),
 array([2]),
 array([], dtype=int64),
 array([1, 2])]
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download