user3605620 - 4 months ago 14

R Question

I have a data.table with missing values where some rows only contain NA's. The

`data.table`

`data.table`

`data.table`

`> dt <- data.table(x=c(1,NA,3),y=c(2,NA,NA),z=c(3,NA,1))`

> dt

x y z

1: 1 2 3

2: NA NA NA

3: 3 NA 1

> w <- apply(dt,1,which.min)

> w

[[1]]

x

1

[[2]]

integer(0)

[[3]]

z

3

> v <- unlist(lapply(w,function(z) ifelse(length(z)==0, NA, z[1])))

> v

[1] 1 NA 3

> dt$idx <- v

> dt

x y z idx

1: 1 2 3 1

2: NA NA NA NA

3: 3 NA 1 3

As you can see, the main reason for inelegance is that

`apply`

`which.min`

`apply`

`unlist`

`lapply`

`data.table`

Answer

using .SD

```
d[, idx := apply(.SD, 1, which.min), .SDcols = c('x', 'y', 'z')]
```

However all NA rows would be blank; actually, as 2nd row is all NA, which.min would return `integer(0)`

, so that the result of `apply`

is of unequal length and d$idx is a list (second element of which is a zero-length vector);

```
> d
x y z idx
1: 1 2 3 1
2: NA NA NA
3: 3 NA 1 3
> d$idx
[[1]]
x
1
[[2]]
integer(0)
[[3]]
z
3
```

So to handle zero-length vectors and set return to NA in those cases;

```
d[, idx := apply(.SD, 1, function(x) which.min(x)[1] ), .SDcols = c('x', 'y', 'z')]
> d$idx
[1] 1 NA 3
```