Ken Williams - 1 year ago 105
R Question

# findInterval() with right-closed intervals

The great

`findInterval()`
function in R uses left-closed sub-intervals in its
`vec`
argument, as shown in its docs:

if
`i <- findInterval(x,v)`
, we have
`v[i[j]] <= x[j] < v[i[j] + 1]`

If I want right-closed sub-intervals, what are my options? The best I've come up with is this:

``````findInterval.rightClosed <- function(x, vec, ...) {
fi <- findInterval(x, vec, ...)
fi - (x==vec[fi])
}
``````

Another one also works:

``````findInterval.rightClosed2 <- function(x, vec, ...) {
length(vec) - findInterval(-x, -rev(vec), ...)
}
``````

Here's a little test:

``````x <- c(3, 6, 7, 7, 29, 37, 52)
vec <- c(2, 5, 6, 35)
findInterval(x, vec)
# [1] 1 3 3 3 3 4 4
findInterval.rightClosed(x, vec)
# [1] 1 2 3 3 3 4 4
findInterval.rightClosed2(x, vec)
# [1] 1 2 3 3 3 4 4
``````

But I'd like to see any other solutions if there's a better one. By "better", I mean "somehow more satisfying" or "doesn't feel like a kludge" or maybe even "more efficient". =)

(Note that there's a
`rightmost.closed`
argument to
`findInterval()`
, but it's different - it only refers to the final sub-interval and has a different meaning.)

EDIT: Major clean-up in all aisles.

You might look at `cut`. By default, `cut` makes left open and right closed intervals, and that can be changed using the appropriate argument (`right`). To use your example:

``````x <- c(3, 6, 7, 7, 29, 37, 52)
vec <- c(2, 5, 6, 35)
cutVec <- c(vec, max(x)) # for cut, range of vec should cover all of x
``````

Now create four functions that should do the same thing: Two from the OP, one from Josh O'Brien, and then `cut`. Two arguments to `cut` have been changed from default settings: `include.lowest = TRUE` will create an interval closed on both sides for the smallest (leftmost) interval. `labels = FALSE` will cause `cut` to return simply the integer values for the bins instead of creating a factor, which it otherwise does.

``````findInterval.rightClosed <- function(x, vec, ...) {
fi <- findInterval(x, vec, ...)
fi - (x==vec[fi])
}
findInterval.rightClosed2 <- function(x, vec, ...) {
length(vec) - findInterval(-x, -rev(vec), ...)
}
cutFun <- function(x, vec){
cut(x, vec, include.lowest = TRUE, labels = FALSE)
}
# The body of fiFun is a contribution by Josh O'Brien that got fed to the ether.
fiFun <- function(x, vec){
xxFI <- findInterval(x, vec * (1 + .Machine\$double.eps))
}
``````

Do all functions return the same result? Yup. (notice the use of `cutVec` for `cutFun`)

``````mapply(identical, list(findInterval.rightClosed(x, vec)),
list(findInterval.rightClosed2(x, vec), cutFun(x, cutVec), fiFun(x, vec)))
# [1] TRUE TRUE TRUE
``````

Now a more demanding vector to bin:

``````x <- rpois(2e6, 10)
vec <- c(-Inf, quantile(x, seq(.2, 1, .2)))
``````

Test whether identical (note use of `unname`)

``````mapply(identical, list(unname(findInterval.rightClosed(x, vec))),
list(findInterval.rightClosed2(x, vec), cutFun(x, vec), fiFun(x, vec)))
# [1] TRUE TRUE TRUE
``````

And benchmark:

``````library(microbenchmark)
microbenchmark(findInterval.rightClosed(x, vec), findInterval.rightClosed2(x, vec),
cutFun(x, vec), fiFun(x, vec), times = 50)
# Unit: milliseconds
#                                expr       min        lq    median        uq       max
# 1                    cutFun(x, vec)  35.46261  35.63435  35.81233  36.68036  53.52078
# 2                     fiFun(x, vec)  51.30158  51.69391  52.24277  53.69253  67.09433
# 3  findInterval.rightClosed(x, vec) 124.57110 133.99315 142.06567 155.68592 176.43291
# 4 findInterval.rightClosed2(x, vec)  79.81685  82.01025  86.20182  95.65368 108.51624
``````

From this run, `cut` seems to be the fastest.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download