nfmcclure - 1 year ago 91
R Question

# How to create mean and s.d. columns in data.table

The following code/outcome baffles me as to why data.table returns NA for the mean functions and not the sd function.

``````library(data.table)
test <- data.frame('id'=c(1,2,3,4,5),
'A'=seq(2,9,length=5),
'B'=seq(3,9,length=5),
'C'=seq(4,9,length=5),
'D'=seq(5,9,length=5))

test <- as.data.table(test)

test[,`:=`(mean_test = mean(.SD), sd_test = sd(.SD)),by=id,.SDcols=c('A','B','C','D')]
> test
id    A   B    C    D mean_test   sd_test
1:  1 2.00 3.0 4.00 5        NA 1.2909944
2:  2 3.75 4.5 5.25 6        NA 0.9682458
3:  3 5.50 6.0 6.50 7        NA 0.6454972
4:  4 7.25 7.5 7.75 8        NA 0.3227486
5:  5 9.00 9.0 9.00 9        NA 0.0000000
``````

I've learned quite a bit searching around, going through the DT tutorials/examples. This question is very similar to what I was hoping to do.

Why does the standard deviation function work and the mean function return NA?

Edit: Using Ricardo Saporta's solution:

``````test[,`:=`(mean_test = apply(.SD, 1, mean), sd_test = apply(.SD, 1, sd),by=id,.SDcols=c('A','B','C','D')]

> test
id    A   B    C D mean_test   sd_test
1:  1 2.00 3.0 4.00 5     3.500 1.2909944
2:  2 3.75 4.5 5.25 6     4.875 0.9682458
3:  3 5.50 6.0 6.50 7     6.250 0.6454972
4:  4 7.25 7.5 7.75 8     7.625 0.3227486
5:  5 9.00 9.0 9.00 9     9.000 0.0000000
``````

`.SD` is itself a `data.table`
Thus, when you take `mean(.SD)` you are (attempting) to take the mean of an entire data.table

The function `mean()` does not know what to do with the data.table and returns `NA`

### Have a look

``````## the .SD in your question is the same as
test[, c('A','B','C','D'), with=FALSE]

## try taking its mean
mean(test[, c('A','B','C','D'), with=FALSE])

# Warning in mean.default(test[, c("A", "B", "C", "D"), with = FALSE]) :
#   argument is not numeric or logical: returning NA
# [1] NA
``````

use `lapply(.SD, mean)` for column-wise or `apply(.SD, 1, mean)` for row-wise