jf328 - 9 months ago 53

R Question

I have a very large unsorted vector A and a sorted vector B (relatively short).

`A = runif(n = 1e6)`

B = seq(0,1,by = 1e-3)

Now given a direction 'forward' or 'backward', for each element in A, find the nearest element in B with that direction. Eg for 'forward'

`A2 = sapply(A, function(x) B[B>=x][1])`

gives the result. However, this is too slow as

`sapply`

`> system.time(sapply(A, function(x) B[B>=x][1]))`

user system elapsed

17.93 0.00 17.93

Is there a way to do this much faster?

(It is guaranteed that

`min(B)<min(A)`

`max(B)>max(A)`

Answer

The `findInterval`

function solves this exact problem, using binary search. Try this:

```
B[findInterval(A,B)+1]
```

A comparison:

```
set.seed(44)
A <- runif(n = 1e6)
B <- seq(0,1,by = 1e-3)
system.time(A2<-sapply(A, function(x) B[B>=x][1]))
# user system elapsed
# 18.058 0.000 15.606
system.time(A3<-B[findInterval(A,B)+1])
# user system elapsed
# 0.00 0.00 0.07
identical(A2,A3)
#[1] TRUE
```

Source (Stackoverflow)