- 1 month ago 10
R Question

Use a factor column in "by" and do not drop empty factors

Suppose I have a data.table:

x <- data.table(x=runif(3), group=factor(c('a','b','a'), levels=c('a','b','c')))

I want to know how many rows in
exist for each

x[, .N, by="group"]
# group N
# 1: a 2
# 2: b 1

Question: is there some way to force the above
to consider all levels of the factor

Notice how since I don't have any rows of with
'c' in the table, I don't get a row for c.

Desired output:

x[, .N, by="group", ???] # somehow use all levels in `group`
# group N
# 1: a 2
# 2: b 1
# 3: c 0


If you are willing to run through the factor levels by enumerating them in i (rather than by setting by="group"), this will get you the hoped for results.

setkey(x, "group")
x[levels(group), .N, by=.EACHI]
#    group N
# 1:     a 2
# 2:     b 1
# 3:     c 0