aenima aenima - 3 months ago 24
R Question

Rcpp/RcppArmadillo: removing non-contiguous elements from a vector based on positions

Let's say I have a vector [2,4,6,8,10], and I need to remove the 2nd and the 4th elements from this vector. The desired resulting vector should be [2,6,10]. This is very easy to implement in R:

v1 <- c(2,4,6,8,10)
v1[-c(2,4)]


But how do I implement this in Rcpp/RcppArmadillo? I can figure out the contiguous case (i.e. removing the 2nd through the 4th elements) by using the
.erase()
function, but the non-contiguous case doesn't seem so obvious to me since
.erase
does not seem to accept
uvec
type of vectors. Speed could be a consideration because v1 could be quite large in my application.

EDIT: Either Rcpp or Armadillo implementation is fine by me as I am using both.

Answer

Here's one possible approach:

#include <Rcpp.h>

Rcpp::LogicalVector logical_index(Rcpp::IntegerVector idx, R_xlen_t n) {
  bool invert = false; 
  Rcpp::LogicalVector result(n, false);

  for (R_xlen_t i = 0; i < idx.size(); i++) {
    if (!invert && idx[i] < 0) invert = true;
    result[std::abs(idx[i])] = true;
  }

  if (!invert) return result;
  return !result;
}


// [[Rcpp::export]]
Rcpp::NumericVector 
Subset(Rcpp::NumericVector x, Rcpp::IntegerVector idx) {
  return x[logical_index(idx, x.size())];
}

x <- seq(2, 10, 2)

x[c(2, 4)]
#[1] 4 8
Subset(x, c(1, 3))
#[1] 4 8

x[-c(2, 4)]
#[1]  2  6 10
Subset(x, -c(1, 3))
#[1]  2  6 10 

Note that the indices for the Rcpp function are 0-based, as they are processed in C++.

I abstracted the subsetting logic into its own function, logical_index, which converts an IntegerVector to a LogicalVector in order to be able to "decide" whether to drop or keep the specified elements (e.g. by inverting the result). I suppose this could be done with integer-based subsetting as well, but it should not matter either way.

Like vector subsetting in R, a vector of all negative indices means to drop the corresponding elements; whereas a vector of all positive indices indicates the elements to keep. I did not check for mixed cases, which should probably throw an exception, as R will do.


Regarding my last comment, it would probably be more sensible to rely on Rcpp's native overloads for ordinary subsetting, and have a dedicated function for negated subsetting (R's x[-c(...)] construct), rather than mixing functionality as above. There are pre-existing sugar expressions for creating such a function, e.g.

#include <Rcpp.h>

template <int RTYPE>
inline Rcpp::Vector<RTYPE> 
anti_subset(const Rcpp::Vector<RTYPE>& x, Rcpp::IntegerVector idx) {
  Rcpp::IntegerVector xi = Rcpp::seq(0, x.size() - 1);
  return x[Rcpp::setdiff(xi, idx)];
}

// [[Rcpp::export]]
Rcpp::NumericVector 
AntiSubset(Rcpp::NumericVector x, Rcpp::IntegerVector idx) {
  return anti_subset(x, idx);
}

/*** R

x <- seq(2, 10, 2)

x[-c(2, 4)]
#[1]  2  6 10

AntiSubset(x, c(1, 3))
#[1]  2  6 10

*/