aplavin - 1 year ago 65
R Question

# apply() is slow - how to make it faster or what are my alternatives?

I have a quite large data frame, about 10 millions of rows. It has columns

`x`
and
`y`
, and what I want is to compute

``````hypot <- function(x) {sqrt(x[1]^2 + x[2]^2)}
``````

for each row. Using
`apply`
it would take a lot of time (about 5 minutes, interpolating from lower sizes) and memory.

But it seems to be too much for me, so I've tried different things:

• compiling the
`hypot`
function reduces the time by about 10%

• using functions from
`plyr`
greatly increases the running time.

What's the fastest way to do this thing?

What about `with(my_data,sqrt(x^2+y^2))` ?

``````set.seed(101)
d <- data.frame(x=runif(1e5),y=runif(1e5))

library(rbenchmark)
``````

Two different per-line functions, one taking advantage of vectorization:

``````hypot <- function(x) sqrt(x[1]^2+x[2]^2)
hypot2 <- function(x) sqrt(sum(x^2))
``````

Try compiling these too:

``````library(compiler)
chypot <- cmpfun(hypot)
chypot2 <- cmpfun(hypot2)

benchmark(sqrt(d[,1]^2+d[,2]^2),
with(d,sqrt(x^2+y^2)),
apply(d,1,hypot),
apply(d,1,hypot2),
apply(d,1,chypot),
apply(d,1,chypot2),
replications=50)
``````

Results:

``````                       test replications elapsed relative user.self sys.self
5       apply(d, 1, chypot)           50  61.147  244.588    60.480    0.172
6      apply(d, 1, chypot2)           50  33.971  135.884    33.658    0.172
3        apply(d, 1, hypot)           50  63.920  255.680    63.308    0.364
4       apply(d, 1, hypot2)           50  36.657  146.628    36.218    0.260
1 sqrt(d[, 1]^2 + d[, 2]^2)           50   0.265    1.060     0.124    0.144
2  with(d, sqrt(x^2 + y^2))           50   0.250    1.000     0.100    0.144
``````

As expected the `with()` solution and the column-indexing solution à la Tyler Rinker are essentially identical; `hypot2` is twice as fast as the original `hypot` (but still about 150 times slower than the vectorized solutions). As already pointed out by the OP, compilation doesn't help very much.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download