Ozgur AlptekÄ±n - 11 months ago 55

R Question

this is my_matrix :

`ui 194635691 194153563 177382028 177382031 195129144 196972549 196258704 194907960 196950156 194139014 153444738 192982501 192891196`

1 237 0.00 0.00 0.00 0.00 0.00 0.00 0 0.01 0 0 0 0 0

2 261 0.01 0.00 0.00 0.00 0.00 0.00 0 0.00 0 0 0 0 0

3 290 0.00 0.00 0.01 0.01 0.00 0.00 0 0.00 0 0 0 0 0

4 483 0.00 0.00 0.00 0.00 0.00 0.01 0 0.00 0 0 0 0 0

5 533 0.00 0.01 0.00 0.00 0.00 0.00 0 0.00 0 0 0 0 0

6 534 0.00 0.00 0.00 0.00 0.01 0.00 0 0.00 0 0 0 0 0

these are my codes are following:

`b=my_matrix[1,2:length(my_matrix)]`

for (i in nrow(my_matrix)) {

res[i]=cosine(b,my_matrix[i,2:length(my_matrix)])

}

I used "lsa" package and

I want to get a cosine similarity matrix that calculate b vector with every other vectors from matrix a but my codes throw a error that says :

`argument mismatch. Either one matrix or two vectors needed as input.`

What Should I do to fix my problem?

many thanks in advance

Answer Source

Package "isa", which is not available for R version 3.2.2, is not really necessary. Just do it yourself, using the definition of cosine similarity:

```
my_matrix <- as.matrix(my_matrix) # Make sure that "my_matrix" is indeed a "matrix".
v <- as.vector(my_matrix[1,-1])
M <- my_matrix[-1,-1]
cosSim <- ( M %*% v ) / sqrt( sum(v*v) * rowSums(M*M) )
```

The first line is only necessary if `my_matrix`

is not yet a `matrix`

but a `data.frame`

.

*A possible explanation for the original error message shown in the question:*

I guess the class of the object `my_matrix`

that was used in the code presented in the question and caused the error message

argument mismatch. Either one matrix or two vectors needed as input.

was `data.frame`

, not a `matrix`

. If so, the arguments `b`

and `my_matrix[i,2:length(my_matrix)]`

in the call of the `cosine`

function are again data.frames, not a vector and a matrix as exspected.

*As an aside:*

Even if `my_matrix`

is coerced to a `matrix`

the code in the question will throw an error massage, since `length(my_matrix)`

is larger than the number of columns and hence `my_matrix[i,2:length(my_matrix)]`

selects undefined columns.
The `i`

-th row of `my_matrix`

without the first column is `my_matrix[i,2:ncol(my_matrix)]`

or shorter `my_matrix[i,-1]`

.