aph - 9 months ago 41

Python Question

I have a 2d numpy array, matrix, of shape (m, n). My actual use-case has m ~ 1e5 and n ~ 100, but for the sake of having a simple minimal example:

`matrix = np.arange(5*3).reshape((5, 3))`

I have an indexing array of integers, idx, of shape (m, ), with each entry between [0, n). This array specifies which column should be selected from each row of

`idx = np.array([2, 0, 2, 1, 1])`

So, I am trying to select column 2 from row 0, column 0 from row 1, column 2 from row 2, column 1 from row 1, and column 1 from row 4. Thus the final answer should be:

`correct_result = np.array((2, 3, 8, 10, 13))`

I have tried the following, which is intuitive, but incorrect:

`incorrect_result = matrix[:, idx]`

What the above syntax does is apply idx as a fancy indexing array, row by row, resulting in another matrix of shape (m, n), which is not what I want.

What is the correct syntax for fancy indexing of this type?

Answer

```
correct_result = matrix[np.arange(m), idx]
```

The advanced indexing expression `matrix[I, J]`

gives an output such that `output[n] == matrix[I[n], J[n]]`

.

If we want `output[n] == matrix[n, idx[n]]`

, then we need `I[n] == n`

and `J[n] == idx[n]`

, so we need `I`

to be `np.arange(m)`

and `J`

to be `idx`

.