swolf - 1 year ago 92
R Question

# Fast check for each row in data.table if specific columns match a criterion

I just can't get my head around this quite simple task and reading up on related problems on this site did not help.

I created a minimal example:

``````a <- data.table(n = c("case1", "case2", "case3"), x = c(0,2,5), y = c(1,1,4), z = c(1,1,0))
cols <- c("x", "y", "z")
a
n x y z
1: case1 0 1 1
2: case2 2 1 1
3: case3 5 4 0
``````

All I want to do is to select all rows from
`a`
were all values in the columns whose names are saved in
`cols`
are above 0.

So what I want to get in this case is:

``````       n x y z
2: case2 2 1 1
``````

I used apply in combination with
`all()`
, but I think there is a much faster way with data.table to do this. My original data is of course much much larger and
`cols`
contains up 80 column names. Thanks for your help!

We can use `Reduce` with `.SDcols`. Specify the columns of interest in `.SDcols`, loop through the Subset of Data.table (`.SD`) check whether it is equal 0, get the sum of each row with `Reduce`, negate (`!`) to get a logical vector which returns TRUE when there are no 0 elements and use that to subset the rows of 'a'
``````a[a[, !Reduce(`+`, lapply(.SD, `<=`, 0)),.SDcols = cols]]
Or as @Frank mentioned in the comments, `pmin` can be used as well
``````a[a[, do.call(pmin, .SD), .SDcols = cols]>0]