sds sds - 3 months ago 16
R Question

Filter data.table by multiple columns, dynamically

Suppose I have a

data.table
with a few columns:

a <- data.table(id=1:1000, x=runif(100), y=runif(100), z=runif(100))


I want to drop the rows where
x
,
y
or
z
is below the median:

a <- a[ x > median(x) & y > median(y) & z > median(z) ]


(aside: does the above call
median
3 times or 3000 times?)

What I do is

my.cols <- c("x","y","z")
my.meds <- sapply(my.cols, function(n) median(a[[n]]))
a <- a[ Reduce(`&`,Map(function(i) a[[my.cols[i]]] > my.meds[i], 1:length(my.cols))) ]


Is this the best I could do?

Answer

One option is to construct the string you want and eval/parse it:

EVAL = function(...)eval(parse(text=paste0(...)))   # standard helper function

a[ EVAL(my.cols, ">median(", my.cols, ")", collapse=" & ") ]