YBA - 5 months ago 51

R Question

Is it possible to use data.table to standardize a number of variables by a number of group variables?

`DT <- data.table(V1=1:20, V2=40:21, gr=c(rep(c('a'),10), rep(c('b'),10)),`

grr=rep(c(rep(c('a'),5), rep(c('b'),5)),2))

gr and grr are the group variables. I want to add to that data.table V1.z and V2.z that are the standardized score within each gr-by-grr group.

Here is an extremely stupid code for that, to explain what I want:

`DTaa <- DT[gr=='a' & grr=='a',]`

DTab <- DT[gr=='a' & grr=='b',]

DTba <- DT[gr=='b' & grr=='a',]

DTbb <- DT[gr=='b' & grr=='b',]

DTaa <- DTaa[,V1.z := scale(V1)]

DTaa <- DTaa[,V2.z := scale(V2)]

DTab <- DTab[,V1.z := scale(V1)]

DTab <- DTab[,V2.z := scale(V2)]

DTba <- DTba[,V1.z := scale(V1)]

DTba <- DTba[,V2.z := scale(V2)]

DTbb <- DTbb[,V1.z := scale(V1)]

DTbb <- DTbb[,V2.z := scale(V2)]

DTn <- rbind(DTaa, DTab, DTba, DTbb)

Probably, there is a way to do it using

`by`

- I'm hoping to then use it in a function that accepts data, the target variables (in the example, V1 and V2), and group variables (in the example, gr and grr) as arguments.
- If you have a solution that does not use data.table, it's also good (I tried using mutate_at from dplyr but couldn't find much documentation about that function).

Answer

After grouping by 'gr' and 'grr', loop over the Subset of Data.table (`.SD`

), `scale`

it (the output of `scale`

is a `matrix`

, so we convert it to `vector`

with `as.vector`

) and assign (`:=`

) the output to the new columns.

```
DT[, paste0(names(DT)[1:2], ".z") := lapply(.SD,
function(x) as.vector(scale(x))), .(gr, grr)]
```