aph - 1 year ago 104
Python Question

# fancy indexing a numpy matrix: one element per row

I have a 2d numpy array, matrix, of shape (m, n). My actual use-case has m ~ 1e5 and n ~ 100, but for the sake of having a simple minimal example:

``````matrix = np.arange(5*3).reshape((5, 3))
``````

I have an indexing array of integers, idx, of shape (m, ), with each entry between [0, n). This array specifies which column should be selected from each row of matrix.

``````idx = np.array([2, 0, 2, 1, 1])
``````

So, I am trying to select column 2 from row 0, column 0 from row 1, column 2 from row 2, column 1 from row 1, and column 1 from row 4. Thus the final answer should be:

``````correct_result = np.array((2, 3, 8, 10, 13))
``````

I have tried the following, which is intuitive, but incorrect:

``````incorrect_result = matrix[:, idx]
``````

What the above syntax does is apply idx as a fancy indexing array, row by row, resulting in another matrix of shape (m, n), which is not what I want.

What is the correct syntax for fancy indexing of this type?

``````correct_result = matrix[np.arange(m), idx]
The advanced indexing expression `matrix[I, J]` gives an output such that `output[n] == matrix[I[n], J[n]]`.
If we want `output[n] == matrix[n, idx[n]]`, then we need `I[n] == n` and `J[n] == idx[n]`, so we need `I` to be `np.arange(m)` and `J` to be `idx`.