Connor J Connor J - 1 month ago 6
R Question

Using with = FALSE and .SD to subset ROWS of .SD

I have a data.table with multiple variables which take a limited number of values. For each variable, I want to build a table counting the fraction of observations that have each value, with grouping.

Example data:

data = data.table(group = c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3),
Var1 = c(1,1,2,2,3,1,2,3,2,2,3,1,2,3,1),
Var2 = c(1,1,2,3,2,2,2,2,1,2,3,2,1,1,3))


I would like two resulting tables:

Var1:
group | Var1_1 | Var1_2 | Var1_3
------|---------|---------|---------
1 | .4 | .4 | .2
2 | .2 | .6 | .2
3 | .4 | .2 | .4


and similarly for
Var2
. This is straightforward for a single variable:

data[, .(Var1_1 = .SD[Var1 == 1, .N]/.N,
Var1_2 = .SD[Var1 == 2, .N]/.N,
Var1_3 = .SD[Var1 == 3, .N]/.N), by = group, .SDcols = "Var1"]


But since I have many variables, I'd like to do this in a loop. I've played around with using
with = FALSE
both in the outer call and the
.SD
call, but haven't been able to get it right. Ideally I can pass the column name to
.SD
's
i
subset,
.SDcols
, and, ideally, as an argument to
paste0()
to create the output variable names (e.g.
paste0(varName, "_1")
).

variables <- c("Var1", "Var2")
for(variable in variables) {
data[...what should go here?...]
}


Any suggestions on how to do this?

Answer

table and prop.table are usually handy for constructing such proportional tables.

lapply(variables, function(var) data[, prop.table(table(group, get(var)), margin = 1)])

#[[1]]

#group   1   2   3
#    1 0.4 0.4 0.2
#    2 0.2 0.6 0.2
#    3 0.4 0.2 0.4

#[[2]]

#group   1   2   3
#    1 0.4 0.4 0.2
#    2 0.2 0.8 0.0
#    3 0.4 0.2 0.4
Comments