user3605620 user3605620 - 2 months ago 6
R Question

Index of an R data.table column with the smallest element

I have a data.table with missing values where some rows only contain NA's. The

data.table
is actually a subset of columns of a larger
data.table
so I'd like to avoid dropping all-NA rows. I want to find an elegant way to find the index of the column with the smallest element in each row and add the resulting array as a new column to the
data.table
. For all-NA rows, I want an NA result. Here is an inelegant way to do so:

> dt <- data.table(x=c(1,NA,3),y=c(2,NA,NA),z=c(3,NA,1))
> dt
x y z
1: 1 2 3
2: NA NA NA
3: 3 NA 1
> w <- apply(dt,1,which.min)
> w
[[1]]
x
1

[[2]]
integer(0)

[[3]]
z
3

> v <- unlist(lapply(w,function(z) ifelse(length(z)==0, NA, z[1])))
> v
[1] 1 NA 3
> dt$idx <- v
> dt
x y z idx
1: 1 2 3 1
2: NA NA NA NA
3: 3 NA 1 3


As you can see, the main reason for inelegance is that
apply
creates a list rather than an array. This happens because results have different length for each row. Also, I am invoking base-R functions
which.min
,
apply
,
unlist
, and
lapply
. Is there a compact
data.table
way to get the desired result?

Answer

using .SD

d[, idx := apply(.SD, 1, which.min), .SDcols = c('x', 'y', 'z')]

However all NA rows would be blank; actually, as 2nd row is all NA, which.min would return integer(0), so that the result of apply is of unequal length and d$idx is a list (second element of which is a zero-length vector);

   > d
    x  y  z idx
1:  1  2  3   1
2: NA NA NA    
3:  3 NA  1   3

> d$idx
[[1]]
x 
1 

[[2]]
integer(0)

[[3]]
z 
3 

So to handle zero-length vectors and set return to NA in those cases;

d[, idx := apply(.SD, 1, function(x) which.min(x)[1] ), .SDcols = c('x', 'y', 'z')]
> d$idx
[1]  1 NA  3