Connor J - 7 months ago 27
R Question

# Using with = FALSE and .SD to subset ROWS of .SD

I have a data.table with multiple variables which take a limited number of values. For each variable, I want to build a table counting the fraction of observations that have each value, with grouping.

Example data:

``````data = data.table(group = c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3),
Var1 =  c(1,1,2,2,3,1,2,3,2,2,3,1,2,3,1),
Var2 =  c(1,1,2,3,2,2,2,2,1,2,3,2,1,1,3))
``````

I would like two resulting tables:

``````Var1:
group | Var1_1  | Var1_2  | Var1_3
------|---------|---------|---------
1     | .4      | .4      | .2
2     | .2      | .6      | .2
3     | .4      | .2      | .4
``````

and similarly for
`Var2`
. This is straightforward for a single variable:

``````data[, .(Var1_1 = .SD[Var1 == 1, .N]/.N,
Var1_2 = .SD[Var1 == 2, .N]/.N,
Var1_3 = .SD[Var1 == 3, .N]/.N), by = group, .SDcols = "Var1"]
``````

But since I have many variables, I'd like to do this in a loop. I've played around with using
`with = FALSE`
both in the outer call and the
`.SD`
call, but haven't been able to get it right. Ideally I can pass the column name to
`.SD`
's
`i`
subset,
`.SDcols`
, and, ideally, as an argument to
`paste0()`
to create the output variable names (e.g.
`paste0(varName, "_1")`
).

``````variables <- c("Var1", "Var2")
for(variable in variables) {
data[...what should go here?...]
}
``````

Any suggestions on how to do this?

`table` and `prop.table` are usually handy for constructing such proportional tables.

``````lapply(variables, function(var) data[, prop.table(table(group, get(var)), margin = 1)])

#[[1]]

#group   1   2   3
#    1 0.4 0.4 0.2
#    2 0.2 0.6 0.2
#    3 0.4 0.2 0.4

#[[2]]

#group   1   2   3
#    1 0.4 0.4 0.2
#    2 0.2 0.8 0.0
#    3 0.4 0.2 0.4
``````
Source (Stackoverflow)