aenima - 2 months ago 4x
R Question

# Rcpp/RcppArmadillo: removing non-contiguous elements from a vector based on positions

Let's say I have a vector [2,4,6,8,10], and I need to remove the 2nd and the 4th elements from this vector. The desired resulting vector should be [2,6,10]. This is very easy to implement in R:

``````v1 <- c(2,4,6,8,10)
v1[-c(2,4)]
``````

But how do I implement this in Rcpp/RcppArmadillo? I can figure out the contiguous case (i.e. removing the 2nd through the 4th elements) by using the
`.erase()`
function, but the non-contiguous case doesn't seem so obvious to me since
`.erase`
does not seem to accept
`uvec`
type of vectors. Speed could be a consideration because v1 could be quite large in my application.

EDIT: Either Rcpp or Armadillo implementation is fine by me as I am using both.

Here's one possible approach:

``````#include <Rcpp.h>

Rcpp::LogicalVector logical_index(Rcpp::IntegerVector idx, R_xlen_t n) {
bool invert = false;
Rcpp::LogicalVector result(n, false);

for (R_xlen_t i = 0; i < idx.size(); i++) {
if (!invert && idx[i] < 0) invert = true;
result[std::abs(idx[i])] = true;
}

if (!invert) return result;
return !result;
}

// [[Rcpp::export]]
Rcpp::NumericVector
Subset(Rcpp::NumericVector x, Rcpp::IntegerVector idx) {
return x[logical_index(idx, x.size())];
}
``````

``````x <- seq(2, 10, 2)

x[c(2, 4)]
#[1] 4 8
Subset(x, c(1, 3))
#[1] 4 8

x[-c(2, 4)]
#[1]  2  6 10
Subset(x, -c(1, 3))
#[1]  2  6 10
``````

Note that the indices for the Rcpp function are 0-based, as they are processed in C++.

I abstracted the subsetting logic into its own function, `logical_index`, which converts an `IntegerVector` to a `LogicalVector` in order to be able to "decide" whether to drop or keep the specified elements (e.g. by inverting the result). I suppose this could be done with integer-based subsetting as well, but it should not matter either way.

Like vector subsetting in R, a vector of all negative indices means to drop the corresponding elements; whereas a vector of all positive indices indicates the elements to keep. I did not check for mixed cases, which should probably throw an exception, as R will do.

Regarding my last comment, it would probably be more sensible to rely on Rcpp's native overloads for ordinary subsetting, and have a dedicated function for negated subsetting (R's `x[-c(...)]` construct), rather than mixing functionality as above. There are pre-existing sugar expressions for creating such a function, e.g.

``````#include <Rcpp.h>

template <int RTYPE>
inline Rcpp::Vector<RTYPE>
anti_subset(const Rcpp::Vector<RTYPE>& x, Rcpp::IntegerVector idx) {
Rcpp::IntegerVector xi = Rcpp::seq(0, x.size() - 1);
return x[Rcpp::setdiff(xi, idx)];
}

// [[Rcpp::export]]
Rcpp::NumericVector
AntiSubset(Rcpp::NumericVector x, Rcpp::IntegerVector idx) {
return anti_subset(x, idx);
}

/*** R

x <- seq(2, 10, 2)

x[-c(2, 4)]
#[1]  2  6 10

AntiSubset(x, c(1, 3))
#[1]  2  6 10

*/
``````