jf328 jf328 - 2 months ago 18
R Question

R fast method to find closest value in vector B for each element in A

I have a very large unsorted vector A and a sorted vector B (relatively short).

A = runif(n = 1e6)
B = seq(0,1,by = 1e-3)


Now given a direction 'forward' or 'backward', for each element in A, find the nearest element in B with that direction. Eg for 'forward'

A2 = sapply(A, function(x) B[B>=x][1])


gives the result. However, this is too slow as
sapply
loops over A.

> system.time(sapply(A, function(x) B[B>=x][1]))
user system elapsed
17.93 0.00 17.93


Is there a way to do this much faster?

(It is guaranteed that
min(B)<min(A)
and
max(B)>max(A)
, if this is useful)

Answer

The findInterval function solves this exact problem, using binary search. Try this:

B[findInterval(A,B)+1]

A comparison:

set.seed(44)
A <- runif(n = 1e6)
B <- seq(0,1,by = 1e-3)
system.time(A2<-sapply(A, function(x) B[B>=x][1]))
#   user  system elapsed 
# 18.058   0.000  15.606
system.time(A3<-B[findInterval(A,B)+1])  
#   user  system elapsed 
#   0.00    0.00    0.07
identical(A2,A3)
#[1] TRUE
Comments