gzc -4 years ago 249
Python Question

# numpy most efficient way to get row index of true values by columns

I want to get row index of true values by column from a 2-D ndarray. By far, I have a solution with a for loop. But I think this isn't efficient because there exits a python native for-loop. I try to figure out a vectorized solution but failed.

Update: It's not necessary to be a vectorized solution, more efficient is better.

``````arr = np.random.randint(2, size=15).reshape((3,5)).astype(bool)
print arr

[[ True False  True False  True]
[False  True False  True  True]
[ True  True False False  True]]

def calc(matrix):
result = []
for i in range(matrix.shape[1]):
result.append(np.argwhere(matrix[:, i]).flatten().tolist())
return result

print calc(arr)
[[0, 2], [1, 2], [0], [1], [0, 1, 2]]
``````

Note: I want row indices grouped by columns. And when a column is all False, I need to get an empty list
`[]`

Approach #1

Here's one vectorized NumPy approach to have those row indices grouped in a list of arrays -

``````r,c = np.where(arr.T)
out = np.split(c, np.flatnonzero(r[1:] != r[:-1])+1)
``````

Sample run -

``````In [63]: arr = np.random.randint(2, size=15).reshape((3,5)).astype(bool)

In [64]: arr
Out[64]:
array([[False, False,  True,  True, False],
[ True,  True, False, False,  True],
[ True,  True, False, False,  True]], dtype=bool)

In [65]: r,c = np.where(arr.T)

In [66]: np.split(c, np.flatnonzero(r[1:] != r[:-1])+1)
Out[66]: [array([1, 2]), array([1, 2]), array([0]), array([0]), array([1, 2])]

In [67]: calc(arr)
Out[67]: [[1, 2], [1, 2], [0], [0], [1, 2]]
``````

Approach #2

Alternatively, we could use `loop comprehension` to avoid that splitting -

``````idx = np.concatenate(([0], np.flatnonzero(r[1:] != r[:-1])+1, [r.size] ))
out = [c[idx[i]:idx[i+1]] for i in range(len(idx)-1)]
``````

We are using `r,c` from approach #1.

Approach #3 ( To output empty lists/arrays for all 0s cols

To account for all zeros columns, for which we need empty lists/arrays, here's a modified approach -

``````idx = np.concatenate(([0], arr.sum(0).cumsum() ))
out = [c[idx[i]:idx[i+1]] for i in range(len(idx)-1)]
``````

We are using `c` from approach #1.

Sample run -

``````In [177]: arr
Out[177]:
array([[ True, False, False, False, False],
[ True, False, False, False,  True],
[ True, False,  True, False,  True]], dtype=bool)

In [178]: idx = np.concatenate(([0], arr.sum(0).cumsum() ))
...: out = [c[idx[i]:idx[i+1]] for i in range(len(idx)-1)]
...:

In [179]: out
Out[179]:
[array([0, 1, 2]),
array([], dtype=int64),
array([2]),
array([], dtype=int64),
array([1, 2])]
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download