Connor J - 8 months ago 32

R Question

I want to create multiple variables that are aggregating various subsets of a dataset. For an illustrating example, say you have the following data:

`DT = data.table(Group1 = c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4,4),`

Group2 = c(1,1,1,2,2,1,1,2,2,2,1,1,1,1,2,1,1,2,2,2),

Var1 = c(1,1,0,0,0,1,1,0,1,0,1,0,0,0,0,0,0,0,0,0))

I want to find several averages of variable

`Var1`

- grouped by
`mean(Var1)`

`Group1`

- for only those with
`mean(Var1)`

, grouped by`Group2 == 1`

`Group1`

- for only those with
`mean(Var1)`

, grouped by`Group2 == 2`

`Group1`

Or, in data.table parlance,

`DT[, mean(Var1), by=Group1]`

DT[Group2==1, mean(Var1), by=Group1]

DT[Group2==2, mean(Var1), by=Group1]

Obviously, calculating any one of these is very straightforward. But I can't find a good way to calculate all three of them, since they use different subsets in

`i`

`DT_all <- DT[, .(avgVar1_all = mean(Var1)), by = Group1]`

DT_1 <- DT[Group2 == 1, .(avgVar1_1 = mean(Var1)), by = Group1]

DT_2 <- DT[Group2 == 2, .(avgVar1_2 = mean(Var1)), by = Group1]

group_info <- merge(DT_all, DT_1, by = "Group1")

group_info <- merge(group_info, DT_2, by = "Group1")

group_info

# Group1 avgVar1_all avgVar1_1 avgVar1_2

# 1: 1 0.4 0.6666667 0.0000000

# 2: 2 0.6 1.0000000 0.3333333

# 3: 3 0.2 0.2500000 0.0000000

# 4: 4 0.0 0.0000000 0.0000000

Is there a more elegant method I could be using?

Answer

Just do it all in one grouping operation using `.SD`

:

```
DT[, .(
all = mean(Var1),
grp1 = .SD[Group2==1, mean(Var1)],
grp2 = .SD[Group2==2, mean(Var1)]
),
by = Group1]
# Group1 all grp1 grp2
#1: 1 0.4 0.6666667 0.0000000
#2: 2 0.6 1.0000000 0.3333333
#3: 3 0.2 0.2500000 0.0000000
#4: 4 0.0 0.0000000 0.0000000
```