Amir Nikooienejad - 3 months ago 25

R Question

My primary goal is to select non-contiguous submatrix using two sets of binary vectors for rows and columns. This is one of many steps I need to do for my MCMC loop which I'm implementing in C++ using Rcpp, RcppArmadillo and RcppEigen.

Three potential ways to do this was (1)using RcppArmadillo, (2) calling my R function from Rcpp and (3) using R directly and pass the results to C++. Although the last option is not convenient for me at all.

I then compared the performance speed of those three scenarios. Interestingly, the direct R code is much faster than the other two! What is more surpising to me is when I call the exact R function from Rcpp, it is much slower than when I call it directly from R. I expected those to have relatively same running speed as was suggested in an example in this older post.

Anyway, the timing results seem a bit strange to me. Any comments on the reason? I use Macbook Pro with El Capitan OS, 2.5 Ghz Intel Core i7. Could it be related to my system, Mac OSX or the way Rcpp is installed on my machine?

Thanks in advance!

Here is the code:

**CPP Section:**

`#include <RcppArmadillo.h>`

// [[Rcpp::depends(RcppArmadillo)]]

using namespace Rcpp;

using namespace arma;

// (1) Using RcppArmadillo functions:

// [[Rcpp::export]]

mat subselect(NumericMatrix X, uvec rows, uvec cols){

mat XX(X.begin(), X.nrow(),X.ncol(), false);

mat y = XX.submat(find(rows>0),find(cols>0));

return (y);

}

// (2) Calling the function from R:

// [[Rcpp::export]]

NumericalMatrix subselect2(NumericMatrix X, NumericVector rows, NumericVector cols){

Environment stats;

Function submat = stats["submat"];

NumericMatrix outmat=submat(X,rows,cols);

return(wrap(outmat));

}

`library(microbenchmark)`

# (3) My R function:

submat <- function(mat,rvec,cvec){

return(mat[as.logical(rvec),as.logical(cvec)])

}

# Comparing the performances:

// Generating data:

set.seed(432)

rows <- rbinom(1000,1,0.1)

cols <- rbinom(1000,1,0.1)

amat <- matrix(1:1e06,1000,1000)

//benchmarking:

microbenchmark(subselect(amat,rows,cols),

subselect2(amat,rows,cols),

submat(amat,rows,cols))

`expr min lq mean median uq max neval`

subselect(amat, rows, cols) 893.670 1566.882 2297.991 1675.282 2184.783 8462.142 100

subselect2(amat, rows, cols) 928.418 1581.553 3554.805 1657.454 2060.837 138801.050 100

submat(amat, rows, cols) 36.313 55.748 66.782 62.709 73.975 136.970 100

Answer

There are a couple of things worth addressing here. First, you made a subtle mistake in the design of your benchmark that had a significant impact on the performance on your Armadillo function, `subselect`

. Observe:

```
set.seed(432)
rows <- rbinom(1000, 1, 0.1)
cols <- rbinom(1000, 1, 0.1)
imat <- matrix(1:1e6, 1000, 1000)
nmat <- imat + 0.0
storage.mode(imat)
# [1] "integer"
storage.mode(nmat)
# [1] "double"
microbenchmark(
"imat" = subselect(imat, rows, cols),
"nmat" = subselect(nmat, rows, cols)
)
# Unit: microseconds
# expr min lq mean median uq max neval
# imat 3088.140 3218.013 4355.2956 3404.4685 4585.1095 21662.540 100
# nmat 139.298 167.116 223.2271 209.4585 238.6875 533.035 100
```

Although R often treats integer literals (e.g. 1, 2, 3, ...) as floating point values, the sequence operator `:`

is one of the few exceptions to this,

```
storage.mode(c(1, 2, 3, 4, 5))
# [1] "double"
storage.mode(1:5)
# [1] "integer"
```

which is why the expression `matrix(1:1e6, 1000, 1000)`

returns an `integer`

matrix, not a `numeric`

matrix. This is problematic because `subselect`

is expecting a `NumericMatrix`

, not an `IntegerMatrix`

, and passing it the latter type triggers a deep copy, hence the difference of more than an order of magnitude in the above benchmark.

Second, there is a noticeable difference between the relative performance of the R function `submat`

and the C++ function `subselect`

over the distribution of your binary indexing vectors, which is presumably due to a difference in underlying algorithms. For more sparse indexing (a greater proportion of 0s than 1s), the R function wins out; and for more dense indexing, the opposite is true. This also seems to be a function of matrix size (or possibly just dimension), as shown in the plots below, where row and column index vectors are generated using `rbinom`

with success parameters 0.0, 0.05, 0.10, ..., 0.95, 1.0 -- first with a 1e3 x 1e3 matrix, and then with a 1e3 x 1e4 matrix. The code for this is included at the end.

**Benchmark code:**

```
library(data.table)
library(microbenchmark)
library(ggplot2)
test_data <- function(nr, nc, p, seed = 123) {
set.seed(seed)
list(
x = matrix(rnorm(nr * nc), nr, nc),
rv = rbinom(nr, 1, p),
cv = rbinom(nc, 1, p)
)
}
tests <- lapply(seq(0, 1, 0.05), function(p) {
lst <- test_data(1e3, 1e3, p)
list(
p = p,
benchmark = microbenchmark::microbenchmark(
R = submat(lst[[1]], lst[[2]], lst[[3]]),
Arma = subselect(lst[[1]], lst[[2]], lst[[3]])
)
)
})
gt <- rbindlist(
Map(function(g) {
data.table(g[[2]])[
,.(Median.us = median(time / 1000)),
by = .(Expr = expr)
][order(Median.us)][
,Relative := Median.us / min(Median.us)
][,pSuccess := sprintf("%3.2f", g[[1]])]
}, tests)
)
ggplot(gt) +
geom_point(
aes(
x = pSuccess,
y = Relative,
color = Expr
),
size = 2,
alpha = 0.75
) +
theme_bw() +
ggtitle("1e3 x 1e3 Matrix")
## change `test_data(1e3, 1e3, p)` to
## `test_data(1e3, 1e4, p)` inside of
## `tests <- lapply(...) ...` to generate
## the second plot
```