Connor J Connor J - 2 months ago 8
R Question

How to create multiple new columns with data.table's j over different subsets

I want to create multiple variables that are aggregating various subsets of a dataset. For an illustrating example, say you have the following data:

DT = data.table(Group1 = c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4,4),
Group2 = c(1,1,1,2,2,1,1,2,2,2,1,1,1,1,2,1,1,2,2,2),
Var1 = c(1,1,0,0,0,1,1,0,1,0,1,0,0,0,0,0,0,0,0,0))


I want to find several averages of variable
Var1
. I want to know:


  • mean(Var1)
    grouped by
    Group1

  • mean(Var1)
    for only those with
    Group2 == 1
    , grouped by
    Group1

  • mean(Var1)
    for only those with
    Group2 == 2
    , grouped by
    Group1



Or, in data.table parlance,

DT[, mean(Var1), by=Group1]
DT[Group2==1, mean(Var1), by=Group1]
DT[Group2==2, mean(Var1), by=Group1]


Obviously, calculating any one of these is very straightforward. But I can't find a good way to calculate all three of them, since they use different subsets in
i
. The solution I've been using so far is generating them individually, then merging them into a unified table.

DT_all <- DT[, .(avgVar1_all = mean(Var1)), by = Group1]
DT_1 <- DT[Group2 == 1, .(avgVar1_1 = mean(Var1)), by = Group1]
DT_2 <- DT[Group2 == 2, .(avgVar1_2 = mean(Var1)), by = Group1]
group_info <- merge(DT_all, DT_1, by = "Group1")
group_info <- merge(group_info, DT_2, by = "Group1")

group_info
# Group1 avgVar1_all avgVar1_1 avgVar1_2
# 1: 1 0.4 0.6666667 0.0000000
# 2: 2 0.6 1.0000000 0.3333333
# 3: 3 0.2 0.2500000 0.0000000
# 4: 4 0.0 0.0000000 0.0000000


Is there a more elegant method I could be using?

Answer

Just do it all in one grouping operation using .SD:

DT[, .(
        all  = mean(Var1),
        grp1 = .SD[Group2==1, mean(Var1)],
        grp2 = .SD[Group2==2, mean(Var1)]
      ),
  by = Group1]

#   Group1 all      grp1      grp2
#1:      1 0.4 0.6666667 0.0000000
#2:      2 0.6 1.0000000 0.3333333
#3:      3 0.2 0.2500000 0.0000000
#4:      4 0.0 0.0000000 0.0000000
Comments