juba - 1 year ago 63
R Question

# Keeping 0 count combinations when aggregating with data.table

Suppose I've got the following

`data.table`
:

``````dt <- data.table(id=c(rep(1,5),rep(2,4)),
sex=c(rep("H",5),rep("F",4)),
fruit=c("apple","tomato","apple","apple","orange","apple","apple","tomato","tomato"),
key="id")

id sex  fruit
1:  1   H  apple
2:  1   H tomato
3:  1   H  apple
4:  1   H  apple
5:  1   H orange
6:  2   F  apple
7:  2   F  apple
8:  2   F tomato
9:  2   F tomato
``````

Each row represents the fact that someone (identified by it's
`id`
and
`sex`
) ate a
`fruit`
. I want to count the number of times each
`fruit`
has been eaten by
`sex`
. I can do it with :

``````dt[,.N,by=c("fruit","sex")]
``````

Which gives :

``````    fruit sex N
1:  apple   H 3
2: tomato   H 1
3: orange   H 1
4:  apple   F 2
5: tomato   F 2
``````

The problem is, by doing it this way I'm losing the count of
`orange`
for
`sex==F`
, because this count is 0. Is there a way to do this aggregation without loosing O-count combinations ?

To be perfectly clear, the desired result would be the following :

``````   fruit sex N
1:  apple   H 3
2: tomato   H 1
3: orange   H 1
4:  apple   F 2
5: tomato   F 2
6: orange   F 0
``````

Thanks a lot !

Seems like the most straightforward approach is to explicitly supply all category combos in a data.table passed to `i=`, setting `by=.EACHI` to iterate over them:
``````setkey(dt,sex,fruit)[CJ(unique(sex), unique(fruit)), .N, by=.EACHI]