ywat - 10 months ago 61

Python Question

I think this is an easy question for experienced numpy users.

I have a score matrix. The raw index corresponds to samples and column index corresponds to items. For example,

`score_matrix =`

[[ 1. , 0.3, 0.4],

[ 0.2, 0.6, 0.8],

[ 0.1, 0.3, 0.5]]

I want to get top-M indices of items for each samples. Also I want to get top-M scores. For example,

`top2_ind =`

[[0, 2],

[2, 1],

[2, 1]]

top2_score =

[[1. , 0.4],

[0,8, 0.6],

[0.5, 0.3]]

What is the best way to do this using numpy?

Answer Source

I'd use `argsort()`

:

```
top2_ind = score_matrix.argsort()[:,::-1][:,:2]
```

That is, produce an array which contains the indices which would sort `score_matrix`

:

```
array([[1, 2, 0],
[0, 1, 2],
[0, 1, 2]])
```

Then reverse the columns with `::-1`

, then take the first two columns with `:2`

:

```
array([[0, 2],
[2, 1],
[2, 1]])
```

Then similar but with regular `np.sort()`

to get the values:

```
top2_score = np.sort(score_matrix)[:,::-1][:,:2]
```

Which following the same mechanics as above, gives you:

```
array([[ 1. , 0.4],
[ 0.8, 0.6],
[ 0.5, 0.3]])
```