user3067923 - 9 months ago 143

R Question

I have the following function (funtest) to test if a specific vector exists in a matrix. The vector will always be length 2 and the matrix will always have two columns. The function works fine, I would just like to make it faster (ideally much faster), because my matrices can have hundreds to thousands of rows.

`x = c(1,2)`

set.seed(100)

m <- matrix(sample(c(1,-2,3,4), 500*2, replace=TRUE), ncol=2)

funtest(m,x)

[1] TRUE

This is how fast it currently is

`library(microbenchmark)`

microbenchmark(funtest(m, x), times=100)

Unit: milliseconds

expr min lq mean median uq max

funtest(m, x) 1.501247 1.536157 1.674668 1.567826 1.708293 2.900046

neval

100

This is the function

`funtest = function(m, x) {`

out = any(apply(m,1,function(n,x) all(n==x),x=x))

return(out)

}

Answer

How about

```
paste(x[1], x[2], sep='&') %in% paste(m[,1], m[,2], sep='&')
```

This should be super efficient! It is based on matching. As soon as the first match is found, no further search will be done!

However I am sure this is not the fastest. The optimal solution is to write this operation in C code with a single while loop. But, the potential speedup factor should be no more than 2.