user3605620 - 10 months ago 42
R Question

# Index of an R data.table column with the smallest element

I have a data.table with missing values where some rows only contain NA's. The

data.table
is actually a subset of columns of a larger
data.table
so I'd like to avoid dropping all-NA rows. I want to find an elegant way to find the index of the column with the smallest element in each row and add the resulting array as a new column to the
data.table
. For all-NA rows, I want an NA result. Here is an inelegant way to do so:

> dt <- data.table(x=c(1,NA,3),y=c(2,NA,NA),z=c(3,NA,1))
> dt
x  y  z
1:  1  2  3
2: NA NA NA
3:  3 NA  1
> w <- apply(dt,1,which.min)
> w
[[1]]
x
1

[[2]]
integer(0)

[[3]]
z
3

> v <- unlist(lapply(w,function(z) ifelse(length(z)==0, NA, z[1])))
> v
[1]  1 NA  3
> dt$idx <- v > dt x y z idx 1: 1 2 3 1 2: NA NA NA NA 3: 3 NA 1 3  As you can see, the main reason for inelegance is that apply creates a list rather than an array. This happens because results have different length for each row. Also, I am invoking base-R functions which.min , apply , unlist , and lapply . Is there a compact data.table way to get the desired result? Answer Source using .SD d[, idx := apply(.SD, 1, which.min), .SDcols = c('x', 'y', 'z')]  However all NA rows would be blank; actually, as 2nd row is all NA, which.min would return integer(0), so that the result of apply is of unequal length and d$idx is a list (second element of which is a zero-length vector);

   > d
x  y  z idx
1:  1  2  3   1
2: NA NA NA
3:  3 NA  1   3

> d$idx [[1]] x 1 [[2]] integer(0) [[3]] z 3  So to handle zero-length vectors and set return to NA in those cases; d[, idx := apply(.SD, 1, function(x) which.min(x)[1] ), .SDcols = c('x', 'y', 'z')] > d$idx
[1]  1 NA  3