aplavin aplavin - 2 months ago 8
R Question

apply() is slow - how to make it faster or what are my alternatives?

I have a quite large data frame, about 10 millions of rows. It has columns

x
and
y
, and what I want is to compute

hypot <- function(x) {sqrt(x[1]^2 + x[2]^2)}


for each row. Using
apply
it would take a lot of time (about 5 minutes, interpolating from lower sizes) and memory.

But it seems to be too much for me, so I've tried different things:


  • compiling the
    hypot
    function reduces the time by about 10%

  • using functions from
    plyr
    greatly increases the running time.



What's the fastest way to do this thing?

Answer

What about with(my_data,sqrt(x^2+y^2)) ?

set.seed(101)
d <- data.frame(x=runif(1e5),y=runif(1e5))

library(rbenchmark)

Two different per-line functions, one taking advantage of vectorization:

hypot <- function(x) sqrt(x[1]^2+x[2]^2)
hypot2 <- function(x) sqrt(sum(x^2))

Try compiling these too:

library(compiler)
chypot <- cmpfun(hypot)
chypot2 <- cmpfun(hypot2)

benchmark(sqrt(d[,1]^2+d[,2]^2),
          with(d,sqrt(x^2+y^2)),
          apply(d,1,hypot),
          apply(d,1,hypot2),
          apply(d,1,chypot),
          apply(d,1,chypot2),
          replications=50)

Results:

                       test replications elapsed relative user.self sys.self
5       apply(d, 1, chypot)           50  61.147  244.588    60.480    0.172
6      apply(d, 1, chypot2)           50  33.971  135.884    33.658    0.172
3        apply(d, 1, hypot)           50  63.920  255.680    63.308    0.364
4       apply(d, 1, hypot2)           50  36.657  146.628    36.218    0.260
1 sqrt(d[, 1]^2 + d[, 2]^2)           50   0.265    1.060     0.124    0.144
2  with(d, sqrt(x^2 + y^2))           50   0.250    1.000     0.100    0.144

As expected the with() solution and the column-indexing solution à la Tyler Rinker are essentially identical; hypot2 is twice as fast as the original hypot (but still about 150 times slower than the vectorized solutions). As already pointed out by the OP, compilation doesn't help very much.