Ozgur Alptekın Ozgur Alptekın - 1 month ago 10
R Question

How can I calculate cosine similarity between first row of my matrix with each other rows in R?

this is my_matrix :

ui 194635691 194153563 177382028 177382031 195129144 196972549 196258704 194907960 196950156 194139014 153444738 192982501 192891196
1 237 0.00 0.00 0.00 0.00 0.00 0.00 0 0.01 0 0 0 0 0
2 261 0.01 0.00 0.00 0.00 0.00 0.00 0 0.00 0 0 0 0 0
3 290 0.00 0.00 0.01 0.01 0.00 0.00 0 0.00 0 0 0 0 0
4 483 0.00 0.00 0.00 0.00 0.00 0.01 0 0.00 0 0 0 0 0
5 533 0.00 0.01 0.00 0.00 0.00 0.00 0 0.00 0 0 0 0 0
6 534 0.00 0.00 0.00 0.00 0.01 0.00 0 0.00 0 0 0 0 0


these are my codes are following:

b=my_matrix[1,2:length(my_matrix)]

for (i in nrow(my_matrix)) {
res[i]=cosine(b,my_matrix[i,2:length(my_matrix)])
}


I used "lsa" package and
I want to get a cosine similarity matrix that calculate b vector with every other vectors from matrix a but my codes throw a error that says :

argument mismatch. Either one matrix or two vectors needed as input.


What Should I do to fix my problem?
many thanks in advance

Answer

Package "isa", which is not available for R version 3.2.2, is not really necessary. Just do it yourself, using the definition of cosine similarity:

my_matrix <- as.matrix(my_matrix)  # Make sure that "my_matrix" is indeed a "matrix".
v <- as.vector(my_matrix[1,-1])
M <- my_matrix[-1,-1]
cosSim <- ( M %*% v ) / sqrt( sum(v*v) * rowSums(M*M) )

The first line is only necessary if my_matrix is not yet a matrix but a data.frame.

A possible explanation for the original error message shown in the question:

I guess the class of the object my_matrix that was used in the code presented in the question and caused the error message

argument mismatch. Either one matrix or two vectors needed as input.

was data.frame, not a matrix. If so, the arguments b and my_matrix[i,2:length(my_matrix)] in the call of the cosine function are again data.frames, not a vector and a matrix as exspected.

As an aside:

Even if my_matrix is coerced to a matrix the code in the question will throw an error massage, since length(my_matrix) is larger than the number of columns and hence my_matrix[i,2:length(my_matrix)] selects undefined columns. The i-th row of my_matrix without the first column is my_matrix[i,2:ncol(my_matrix)] or shorter my_matrix[i,-1].