caseyk caseyk - 11 months ago 55
R Question

Rcpp/RcppArmadillo C++/R balance for Performance

When I benchmark small chunks of code/functions written in RcppArmadillo I see sometimes incredible (55x vs R for nested for loops in simple operations) to modest(1.3x vs R for long duration functions) speed gains. Impressed with this, I decided to translate some 400 lines of code to create one C++ function(with some small adjacent C++ functions) to replace the computationally intensive body of my R application.

Old Results:
The RcppArmadillo code is running ~3x slower then native R(update- perhaps a poor benchmark - working on it). A hybrid RcppArmadillo & R variant of the code is running ~1.10 times faster then R.

Update/Lessons Learned:
The C++ intensive code is running ~6x faster than the R&C++ hybrid code. For any passer-by's the mistakes I made are the following:

  • I did not account for variable instantiation in my R benchmark(unfair comparison).

  • Misplaced some computations inside of a nested C++ loop. In the R variant they were outside of said loop(includes big FFT, etc). Human error.

  • Large amount of function parameters were being passed (large vectors, matrices, with casting, etc). Solved this by porting more code.

  • I was not handling memory efficiently. Extra copies were being made instead of just using them. These copies aided readability, but likely damaged performance. Easy to fix.

  • Made use of global variables to reduce function passing over-head and remove a few large temporary vectors from some computations(vector size: 2^15).

Edit(sorry for discontinuities with the first comment on this post)
Old Questions:

  • Is global variable instantiation/preliminary memory allocation advised in the RcppArmadillo code space, to save from many variable declarations in a function body/R garbage collection? Or does Rcpp handle these in as timely of a manner as R?

  • Am I correct in assuming that Rcpp loses some performance due to interfacing with R(protecting variables, garbage collection, etc)? If so, where can I find the code(file name)/documentation where these operations are handled so I can learn how to work with it better.

Any practical advice appreciated. I apologize for the vague question, I am new to SO, new to the RcppArmadillo package, and haven't written C++ in 10 years.

Answer Source

The RcppArmadillo code is running ~3x slower then native R. A hybrid RcppArmadillo & R variant of the code is running ~1.10 times faster then R.

You must be copying a lot. That is simply not plausible. Maybe time to profile your code?

Take any posted example. Maybe from the Rcpp Gallery which has multiple posts. Maybe from the by now 250 (!!) CRAN packages using RcppArmadillo. Start with something simple, and time it.

You should see the ~ 50x factor for loops converted from R to C++. You should see something close to maybe 1.5 times faster for purely vectorized code (as R tends to do more error checking that code we write in small extensions).

Edit: Here is a trivial benchmark showing how trivial it is to set one up. You really can (and should !!) time all portions you suspect are inefficient. We all make errors, both in design and in execution. The good thing is that you have both tools and plenty of posted examples to guide you.

R> Rfunc <- function(N) { s <- 0; for (i in 1:N) for (j in 1:N) s <- s+1; s }
R> Rfunc(10)
[1] 100
R> library(Rcpp)
R> cppFunction("double Cppfunc(int N) { double s=0; for (int i=0; i<N; i++) for (int j=0; j<N; j++) s++; return(s); }")
R> Cppfunc(10)
[1] 100
R> library(rbenchmark)
R> N <- 1000
R> benchmark(Rfunc(N), Cppfunc(N), order="relative")[,1:4]
        test replications elapsed relative
2 Cppfunc(N)          100   0.073    1.000
1   Rfunc(N)          100  12.596  172.548