ywat ywat - 21 days ago 9
Python Question

Get top-M indices and scores from score matrix using Numpy

I think this is an easy question for experienced numpy users.

I have a score matrix. The raw index corresponds to samples and column index corresponds to items. For example,

score_matrix =
[[ 1. , 0.3, 0.4],
[ 0.2, 0.6, 0.8],
[ 0.1, 0.3, 0.5]]


I want to get top-M indices of items for each samples. Also I want to get top-M scores. For example,

top2_ind =
[[0, 2],
[2, 1],
[2, 1]]

top2_score =
[[1. , 0.4],
[0,8, 0.6],
[0.5, 0.3]]


What is the best way to do this using numpy?

Answer

I'd use argsort():

top2_ind = score_matrix.argsort()[:,::-1][:,:2]

That is, produce an array which contains the indices which would sort score_matrix:

array([[1, 2, 0],
       [0, 1, 2],
       [0, 1, 2]])

Then reverse the columns with ::-1, then take the first two columns with :2:

array([[0, 2],
       [2, 1],
       [2, 1]])

Then similar but with regular np.sort() to get the values:

top2_score = np.sort(score_matrix)[:,::-1][:,:2]

Which following the same mechanics as above, gives you:

array([[ 1. ,  0.4],
       [ 0.8,  0.6],
       [ 0.5,  0.3]])
Comments