Vladislav Ladenkov Vladislav Ladenkov - 2 months ago 15
Python Question

Fastest way to calculate "cosine" metrics with scipy

I am given a matrix of ones and zeros. I need to find 20 rows which have the highest cosine metrics towards 1

specific
row in matrix:

If I have 10 rows, and 5th is called
specific
, I want to choose the highest value between these:

cosine(1row,5row),cosine(2row,5row),...,cosine(8row,5row),cosine(9row,5row)


First, i tried to count metrics.
This didn't work:

A = ratings[:,100]
A = A.reshape(1,A.shape[0])
B = ratings.transpose()
similarity = -cosine(A,B)+1
A.shape = (1L, 71869L)
B.shape = (10000L, 71869L)


Error is:
Input vector should be 1-D.
I'd like to know, how to implement this aesthetically with no errors, but the most important - which solution will be the fastest?

In my opinion, the fastest way is not realized with help of
scipy
;
We just have to take all ones in
specific
row and look at these indices in all other rows. Those rows, which have the highest coincidence will have the highest matrix.

Are there any faster ways?

Answer

The fastest way is to use matrix operations: something like np.multipy(A,B)