Chinmay Patil Chinmay Patil - 29 days ago 8
R Question

Why is rbindlist "better" than rbind?

I am going through documentation of

and also noticed from some of the conversations over here on SO that
is supposed to be better than

I would like to know why is
better than
and in which scenarios
really excels over

Is there any advantage in terms of memory utilization?


rbindlist is an optimized version of, list(...)), which is known for being slow when using

Where does it really excel

Some questions that show where rbindlist shines are

how to merge a list of data.frames by row

Trouble converting long list of data.frames (~1 million) to single data.frame using and ldply

These have benchmarks that show how fast it can be. is slow, for a reason does lots of checking, and will match by name. (i.e. will account for the fact that columns may be in different orders, and match up by name), rbindlist doesn't do this kind of checking, and will join by position

eg, list(data.frame(a = 1:2, b = 2:3), data.frame(b = 1:2, a = 2:3)))
##    a b
## 1  1 2
## 2  2 3
## 3  2 1
## 4  3 2

rbindlist(list(data.frame(a = 1:5, b = 2:6), data.frame(b = 1:5, a = 2:6)))
##     a b
##  1: 1 2
##  2: 2 3
##  3: 1 2
##  4: 2 3

Some other limitations of rbindlist

It used to struggle to deal with factors, due to a bug that has since been fixed:

rbindlist two data.tables where one has factor and other has character type for a column (Bug #2650)

It has problems with duplicate column names

see Warning message: in rbindlist(allargs) : NAs introduced by coercion: possible bug in data.table? (Bug #2384) rownames can be frustrating

rbindlist can handle lists data.frames and data.tables, and will return a data.table without rownames

you can get in a muddle of rownames using, list(...)) see

How to avoid renaming of rows when using rbind inside

Memory efficiency

In terms of memory rbindlist is implemented in C, so is memory efficient, it uses setattr to set attributes by reference is implemented in R, it does lots of assigning, and uses attr<- (and class<- and rownames<- all of which will (internally) create copies of the created data.frame.