user3605620 - 1 year ago 66
R Question

# Index of an R data.table column with the smallest element

I have a data.table with missing values where some rows only contain NA's. The

`data.table`
is actually a subset of columns of a larger
`data.table`
so I'd like to avoid dropping all-NA rows. I want to find an elegant way to find the index of the column with the smallest element in each row and add the resulting array as a new column to the
`data.table`
. For all-NA rows, I want an NA result. Here is an inelegant way to do so:

``````> dt <- data.table(x=c(1,NA,3),y=c(2,NA,NA),z=c(3,NA,1))
> dt
x  y  z
1:  1  2  3
2: NA NA NA
3:  3 NA  1
> w <- apply(dt,1,which.min)
> w
[[1]]
x
1

[[2]]
integer(0)

[[3]]
z
3

> v <- unlist(lapply(w,function(z) ifelse(length(z)==0, NA, z[1])))
> v
[1]  1 NA  3
> dt\$idx <- v
> dt
x  y  z idx
1:  1  2  3   1
2: NA NA NA  NA
3:  3 NA  1   3
``````

As you can see, the main reason for inelegance is that
`apply`
creates a list rather than an array. This happens because results have different length for each row. Also, I am invoking base-R functions
`which.min`
,
`apply`
,
`unlist`
, and
`lapply`
. Is there a compact
`data.table`
way to get the desired result?

using .SD

``````d[, idx := apply(.SD, 1, which.min), .SDcols = c('x', 'y', 'z')]
``````

However all NA rows would be blank; actually, as 2nd row is all NA, which.min would return `integer(0)`, so that the result of `apply` is of unequal length and d\$idx is a list (second element of which is a zero-length vector);

``````   > d
x  y  z idx
1:  1  2  3   1
2: NA NA NA
3:  3 NA  1   3

> d\$idx
[[1]]
x
1

[[2]]
integer(0)

[[3]]
z
3
``````

So to handle zero-length vectors and set return to NA in those cases;

``````d[, idx := apply(.SD, 1, function(x) which.min(x)[1] ), .SDcols = c('x', 'y', 'z')]
> d\$idx
[1]  1 NA  3
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download