mathematical.coffee mathematical.coffee - 9 days ago 6
R Question

Use a factor column in "by" and do not drop empty factors

Suppose I have a data.table:

x <- data.table(x=runif(3), group=factor(c('a','b','a'), levels=c('a','b','c')))


I want to know how many rows in
x
exist for each
group
:

x[, .N, by="group"]
# group N
# 1: a 2
# 2: b 1


Question: is there some way to force the above
by="group"
to consider all levels of the factor
group
?


Notice how since I don't have any rows of with
group
'c' in the table, I don't get a row for c.

Desired output:

x[, .N, by="group", ???] # somehow use all levels in `group`
# group N
# 1: a 2
# 2: b 1
# 3: c 0

Answer

If you are willing to run through the factor levels by enumerating them in i (rather than by setting by="group"), this will get you the hoped for results.

setkey(x, "group")
x[levels(group), .N, by=.EACHI]
#    group N
# 1:     a 2
# 2:     b 1
# 3:     c 0