YBA - 1 year ago 97
R Question

# Standardize by group using data.table

Is it possible to use data.table to standardize a number of variables by a number of group variables?

DT <- data.table(V1=1:20, V2=40:21, gr=c(rep(c('a'),10), rep(c('b'),10)),
grr=rep(c(rep(c('a'),5), rep(c('b'),5)),2))

gr and grr are the group variables. I want to add to that data.table V1.z and V2.z that are the standardized score within each gr-by-grr group.

Here is an extremely stupid code for that, to explain what I want:

DTaa <- DT[gr=='a' & grr=='a',]
DTab <- DT[gr=='a' & grr=='b',]
DTba <- DT[gr=='b' & grr=='a',]
DTbb <- DT[gr=='b' & grr=='b',]
DTaa <- DTaa[,V1.z := scale(V1)]
DTaa <- DTaa[,V2.z := scale(V2)]
DTab <- DTab[,V1.z := scale(V1)]
DTab <- DTab[,V2.z := scale(V2)]
DTba <- DTba[,V1.z := scale(V1)]
DTba <- DTba[,V2.z := scale(V2)]
DTbb <- DTbb[,V1.z := scale(V1)]
DTbb <- DTbb[,V2.z := scale(V2)]
DTn <- rbind(DTaa, DTab, DTba, DTbb)

Probably, there is a way to do it using
by
in one or two lines.

• I'm hoping to then use it in a function that accepts data, the target variables (in the example, V1 and V2), and group variables (in the example, gr and grr) as arguments.

• If you have a solution that does not use data.table, it's also good (I tried using mutate_at from dplyr but couldn't find much documentation about that function).