digEmAll digEmAll - 1 month ago 7
R Question

Should SEXP function args be PROTECTed when put inside an Rcpp::Xptr?

Look at the (oversimplified)

Rcpp
+
R
code below :

test.cpp :

#include <Rcpp.h>
using namespace Rcpp;

class VecWrap{
public:
SEXP vector;
int type;

VecWrap(SEXP vector)
{
this->vector = vector;
this->type = TYPEOF(vector);
if(this->type != INTSXP && this->type != REALSXP)
stop("invalid type");
}

bool contains(double val){
if(type == INTSXP){
IntegerVector v = vector;
for(int i = 0; i < v.size(); i++)
if(v[i] == val)
return true;
}else if(type == REALSXP){
NumericVector v = vector;
for(int i = 0; i < v.size(); i++)
if(v[i] == val)
return true;
}
return false;
}
};

// [[Rcpp::export]]
SEXP createVecWrap(SEXP x) {
VecWrap* w = new VecWrap(x);
return XPtr< VecWrap >(w);
}

// [[Rcpp::export]]
SEXP vecWrapContains(XPtr< VecWrap > w, double val){
return wrap(w->contains(val));
}


test.R :

library(Rcpp)
sourceCpp(file='test.cpp')

v <- 1:10e7

w <- createVecWrap(v)
vecWrapContains(w, 10000) # it works

# remove v and call the garbage collector
rm(v)
gc()

vecWrapContains(w, 10000) # R crashes (but it works with small vector "v")


Basically I put inside the custom class
VecWrap
the
SEXP
vector received as argument of
createVecWrap
function, in order to use it later.

But, as explained by the comments in the code, if I remove the vector
v
from the R-side and call the garbage collector, the R process crashes when I try to access the vector.

Should the vector be protected by the GC in someway ? If so, how? (Rcpp-style if possible)

Answer

Generally speaking you should try to stick to the C++ type system / Rcpp classes as much as possible (re: avoid handling SEXP directly if possible). However, the RObject class will provide your SEXP with protection from the garbage collection, and seems to work in this case:

#include <Rcpp.h>

class VecWrap {
public:
    Rcpp::RObject vector;
    int type;

    VecWrap(SEXP vector_)
        : vector(vector_)
    {
        type = vector.sexp_type();
        if (type != INTSXP && type != REALSXP) {
            Rcpp::stop("invalid type");
        }

    }

    bool contains(double val) {
        if (type == INTSXP){
            Rcpp::IntegerVector v = Rcpp::as<Rcpp::IntegerVector>(vector);
            for (int i = 0; i < v.size(); i++) {
                if (v[i] == val) return true;
            }
        } else if (type == REALSXP) {
            Rcpp::NumericVector v = Rcpp::as<Rcpp::NumericVector>(vector);
            for (int i = 0; i < v.size(); i++) {
                if (v[i] == val) return true;
            }
        }
        return false;
    }
};

// [[Rcpp::export]]
Rcpp::XPtr<VecWrap> createVecWrap(SEXP x) {
    return Rcpp::XPtr<VecWrap>(new VecWrap(x));
}

// [[Rcpp::export]]
bool vecWrapContains(Rcpp::XPtr<VecWrap> w, double val) {
    return w->contains(val);
}

v <- 1:10e7
w <- createVecWrap(v)
vecWrapContains(w, 10000)
# [1] TRUE

rm(v)
gc()
#             used  (Mb) gc trigger   (Mb)  max used  (Mb)
# Ncells    366583  19.6     750400   40.1    460000  24.6
# Vcells 100559876 767.3  145208685 1107.9 100560540 767.3

vecWrapContains(w, 10000)
# [1] TRUE

Unrelated: consider using { } for your control flow structures, and don't get carried away with this->; both of those will improve the readability of your code IMO.