shadow shadow - 29 days ago 7
R Question

rbindlist for factors with missing levels

I have several

data.tables
that I would like to
rbindlist
. The tables contain factors with (possibly missing) levels. Then
rbindlist(...)
behaves differently from
do.call(rbind(...))
:

dt1 <- data.table(x=factor(c("a", "b"), levels=letters))

rbindlist(list(dt1, dt1))[,x]
## [1] a b a b
## Levels: a b

do.call(rbind, list(dt1, dt1))[,x]
## [1] a b a b
## Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z


If I want to keep the levels, do I have tor resort to
rbind
or is there a
data.table
way?

Answer

I guess rbindlist is faster because it doesn't do the checking of do.call(rbind.data.frame,...)

Why not to set the levels after binding?

    Dt <- rbindlist(list(dt1, dt1)) 
    setattr(Dt$x,"levels",letters)  ## set attribute without a copy

from the ?setattr:

setattr() is useful in many situations to set attributes by reference and can be used on any object or part of an object, not just data.tables.