ywat - 21 days ago 9
Python Question

# Get top-M indices and scores from score matrix using Numpy

I think this is an easy question for experienced numpy users.

I have a score matrix. The raw index corresponds to samples and column index corresponds to items. For example,

``````score_matrix =
[[ 1. ,  0.3,  0.4],
[ 0.2,  0.6,  0.8],
[ 0.1,  0.3,  0.5]]
``````

I want to get top-M indices of items for each samples. Also I want to get top-M scores. For example,

``````top2_ind =
[[0, 2],
[2, 1],
[2, 1]]

top2_score =
[[1. , 0.4],
[0,8, 0.6],
[0.5, 0.3]]
``````

What is the best way to do this using numpy?

I'd use `argsort()`:

``````top2_ind = score_matrix.argsort()[:,::-1][:,:2]
``````

That is, produce an array which contains the indices which would sort `score_matrix`:

``````array([[1, 2, 0],
[0, 1, 2],
[0, 1, 2]])
``````

Then reverse the columns with `::-1`, then take the first two columns with `:2`:

``````array([[0, 2],
[2, 1],
[2, 1]])
``````

Then similar but with regular `np.sort()` to get the values:

``````top2_score = np.sort(score_matrix)[:,::-1][:,:2]
``````

Which following the same mechanics as above, gives you:

``````array([[ 1. ,  0.4],
[ 0.8,  0.6],
[ 0.5,  0.3]])
``````