swolf - 10 months ago 55
R Question

Fast check for each row in data.table if specific columns match a criterion

I just can't get my head around this quite simple task and reading up on related problems on this site did not help.

I created a minimal example:

a <- data.table(n = c("case1", "case2", "case3"), x = c(0,2,5), y = c(1,1,4), z = c(1,1,0))
cols <- c("x", "y", "z")
a
n x y z
1: case1 0 1 1
2: case2 2 1 1
3: case3 5 4 0


All I want to do is to select all rows from
a
were all values in the columns whose names are saved in
cols
are above 0.

So what I want to get in this case is:

       n x y z
2: case2 2 1 1


I used apply in combination with
all()
, but I think there is a much faster way with data.table to do this. My original data is of course much much larger and
cols
contains up 80 column names. Thanks for your help!

We can use Reduce with .SDcols. Specify the columns of interest in .SDcols, loop through the Subset of Data.table (.SD) check whether it is equal 0, get the sum of each row with Reduce, negate (!) to get a logical vector which returns TRUE when there are no 0 elements and use that to subset the rows of 'a'
a[a[, !Reduce(+, lapply(.SD, <=, 0)),.SDcols = cols]]

Or as @Frank mentioned in the comments, pmin can be used as well
a[a[, do.call(pmin, .SD), .SDcols = cols]>0]