jf328 - 2 months ago 18
R Question

# R fast method to find closest value in vector B for each element in A

I have a very large unsorted vector A and a sorted vector B (relatively short).

``````A = runif(n = 1e6)
B = seq(0,1,by = 1e-3)
``````

Now given a direction 'forward' or 'backward', for each element in A, find the nearest element in B with that direction. Eg for 'forward'

``````A2 = sapply(A, function(x) B[B>=x][1])
``````

gives the result. However, this is too slow as
`sapply`
loops over A.

``````> system.time(sapply(A, function(x) B[B>=x][1]))
user  system elapsed
17.93    0.00   17.93
``````

Is there a way to do this much faster?

(It is guaranteed that
`min(B)<min(A)`
and
`max(B)>max(A)`
, if this is useful)

The `findInterval` function solves this exact problem, using binary search. Try this:

``````B[findInterval(A,B)+1]
``````

A comparison:

``````set.seed(44)
A <- runif(n = 1e6)
B <- seq(0,1,by = 1e-3)
system.time(A2<-sapply(A, function(x) B[B>=x][1]))
#   user  system elapsed
# 18.058   0.000  15.606
system.time(A3<-B[findInterval(A,B)+1])
#   user  system elapsed
#   0.00    0.00    0.07
identical(A2,A3)
#[1] TRUE
``````