juba juba - 8 days ago 6
R Question

Keeping 0 count combinations when aggregating with data.table

Suppose I've got the following

data.table
:

dt <- data.table(id=c(rep(1,5),rep(2,4)),
sex=c(rep("H",5),rep("F",4)),
fruit=c("apple","tomato","apple","apple","orange","apple","apple","tomato","tomato"),
key="id")

id sex fruit
1: 1 H apple
2: 1 H tomato
3: 1 H apple
4: 1 H apple
5: 1 H orange
6: 2 F apple
7: 2 F apple
8: 2 F tomato
9: 2 F tomato


Each row represents the fact that someone (identified by it's
id
and
sex
) ate a
fruit
. I want to count the number of times each
fruit
has been eaten by
sex
. I can do it with :

dt[,.N,by=c("fruit","sex")]


Which gives :

fruit sex N
1: apple H 3
2: tomato H 1
3: orange H 1
4: apple F 2
5: tomato F 2


The problem is, by doing it this way I'm losing the count of
orange
for
sex==F
, because this count is 0. Is there a way to do this aggregation without loosing O-count combinations ?

To be perfectly clear, the desired result would be the following :

fruit sex N
1: apple H 3
2: tomato H 1
3: orange H 1
4: apple F 2
5: tomato F 2
6: orange F 0


Thanks a lot !

Answer

Seems like the most straightforward approach is to explicitly supply all category combos in a data.table passed to i=, setting by=.EACHI to iterate over them:

setkey(dt,sex,fruit)[CJ(unique(sex), unique(fruit)), .N, by=.EACHI]
#    sex  fruit N
# 1:   F  apple 2
# 2:   F orange 0
# 3:   F tomato 2
# 4:   H  apple 3
# 5:   H orange 1
# 6:   H tomato 1