leodido leodido - 3 months ago 12
R Question

Templated Rcpp function to erase NA values

I would write a function (using

Rcpp
) that removes all the
NA
values from a
R
vector.

Before doing so, I did a little test function through
Rcpp::cppFunction
function.

library(inline)
cppFunction('
Vector<INTSXP> na_test(const Vector<INTSXP>& x) {
return setdiff(x, Vector<INTSXP>::create(::traits::get_na<INTSXP>()));
}
')


That works this way:

na_test(c(1, NA, NA, 1, 2, NA))
# [1] 1 2


After that I tried to generalize this function through the
C++
template mechanism.

So, in an external .cpp file (sourced through
sourceCpp
function), I've written:

template <int RTYPE>
Vector<RTYPE> na_test_template(const Vector<RTYPE>& x) {
return setdiff(x, Vector<RTYPE>::create(::traits::get_na<RTYPE>()));
}

// [[Rcpp::export(na_test_cpp)]]
SEXP na_test(SEXP x) {
switch(TYPEOF(x)) {
case INTSXP:
return na_test_template<INTSXP>(x);
case REALSXP:
return na_test_template<REALSXP>(x);
}
return R_NilValue;
}


This code compiles but behaves differently and I cannot explain why.

Infact:

na_test_cpp(c(1, NA, NA, 1, 2, NA))
# [1] 2 NA NA NA 1


Why the same function (apparently) behaves differently? What happens here?

Answer

Following up on your answer, I would use something like this as the template:

template <int RTYPE>
Vector<RTYPE> na_omit_template(const Vector<RTYPE>& x) {
    int n = x.size() ;
    int n_out = n - sum( is_na(x) ) ;

    Vector<RTYPE> out(n_out) ;
    for( int i=0, j=0; i<n; i++){
        if( Vector<RTYPE>::is_na( x[i] ) ) continue ;
        out[j++] = x[i];
    }
    return out ;
}

So the idea is to first calculate the length of the result, and then just use Rcpp vector classes instead of std::vector. This will lead to less copies of the data.


With the development version of Rcpp (svn revision >= 4308), it works for me for all types, and we can then use our RCPP_RETURN_VECTOR dispatch macro instead of writing the switch:

// [[Rcpp::export]]
SEXP na_omit( SEXP x ){
     RCPP_RETURN_VECTOR( na_omit_template, x ) ;   
}

na_omit has been included in Rcpp(svn revision >= 4309), with a few modifications, i.e. it can handle named vectors and arbitrary sugar expressions.

Comments