user3067923 user3067923 - 4 months ago 20
R Question

making function that checks if vector exists in matrix faster

I have the following function (funtest) to test if a specific vector exists in a matrix. The vector will always be length 2 and the matrix will always have two columns. The function works fine, I would just like to make it faster (ideally much faster), because my matrices can have hundreds to thousands of rows.

x = c(1,2)

set.seed(100)
m <- matrix(sample(c(1,-2,3,4), 500*2, replace=TRUE), ncol=2)

funtest(m,x)
[1] TRUE


This is how fast it currently is

library(microbenchmark)
microbenchmark(funtest(m, x), times=100)
Unit: milliseconds
expr min lq mean median uq max
funtest(m, x) 1.501247 1.536157 1.674668 1.567826 1.708293 2.900046
neval
100


This is the function

funtest = function(m, x) {
out = any(apply(m,1,function(n,x) all(n==x),x=x))
return(out)
}

Answer

How about

paste(x[1], x[2], sep='&') %in% paste(m[,1], m[,2], sep='&')

This should be super efficient! It is based on matching. As soon as the first match is found, no further search will be done!

However I am sure this is not the fastest. The optimal solution is to write this operation in C code with a single while loop. But, the potential speedup factor should be no more than 2.