YBA YBA - 1 year ago 97
R Question

Standardize by group using data.table

Is it possible to use data.table to standardize a number of variables by a number of group variables?

DT <- data.table(V1=1:20, V2=40:21, gr=c(rep(c('a'),10), rep(c('b'),10)),
grr=rep(c(rep(c('a'),5), rep(c('b'),5)),2))

gr and grr are the group variables. I want to add to that data.table V1.z and V2.z that are the standardized score within each gr-by-grr group.

Here is an extremely stupid code for that, to explain what I want:

DTaa <- DT[gr=='a' & grr=='a',]
DTab <- DT[gr=='a' & grr=='b',]
DTba <- DT[gr=='b' & grr=='a',]
DTbb <- DT[gr=='b' & grr=='b',]
DTaa <- DTaa[,V1.z := scale(V1)]
DTaa <- DTaa[,V2.z := scale(V2)]
DTab <- DTab[,V1.z := scale(V1)]
DTab <- DTab[,V2.z := scale(V2)]
DTba <- DTba[,V1.z := scale(V1)]
DTba <- DTba[,V2.z := scale(V2)]
DTbb <- DTbb[,V1.z := scale(V1)]
DTbb <- DTbb[,V2.z := scale(V2)]
DTn <- rbind(DTaa, DTab, DTba, DTbb)

Probably, there is a way to do it using
in one or two lines.

  • I'm hoping to then use it in a function that accepts data, the target variables (in the example, V1 and V2), and group variables (in the example, gr and grr) as arguments.

  • If you have a solution that does not use data.table, it's also good (I tried using mutate_at from dplyr but couldn't find much documentation about that function).

Answer Source

After grouping by 'gr' and 'grr', loop over the Subset of Data.table (.SD), scale it (the output of scale is a matrix, so we convert it to vector with as.vector) and assign (:=) the output to the new columns.

DT[, paste0(names(DT)[1:2], ".z") := lapply(.SD, 
                  function(x) as.vector(scale(x))), .(gr, grr)]
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download