JD Long - 9 days ago 7x
R Question

# Converting a list of data frames into one data frame in R

I have code that at one place ends up with a list of data frames which I really want to convert to a single big data frame.

I got some pointers from an earlier question which was trying to do something similar but more complex.

Here's an example of what I am starting with (this is grossly simplified for illustration):

``````listOfDataFrames <- vector(mode = "list", length = 100)

for (i in 1:100) {
listOfDataFrames[[i]] <- data.frame(a=sample(letters, 500, rep=T),
b=rnorm(500), c=rnorm(500))
}
``````

I am currently using this:

``````  df <- do.call("rbind", listOfDataFrames)
``````

One other option is to use a plyr function:

``````df <- ldply(listOfDataFrames, data.frame)
``````

This is a little slower than the original:

``````> system.time({ df <- do.call("rbind", listOfDataFrames) })
user  system elapsed
0.25    0.00    0.25
> system.time({ df2 <- ldply(listOfDataFrames, data.frame) })
user  system elapsed
0.30    0.00    0.29
> identical(df, df2)
[1] TRUE
``````

My guess is that using `do.call("rbind", ...)` is going to be the fastest approach that you will find unless you can do something like (a) use a matrices instead of a data.frames and (b) preallocate the final matrix and assign to it rather than growing it.

Edit 1:

Based on Hadley's comment, here's the latest version of `rbind.fill` from CRAN:

``````> system.time({ df3 <- rbind.fill(listOfDataFrames) })
user  system elapsed
0.24    0.00    0.23
> identical(df, df3)
[1] TRUE
``````

This is easier than rbind, and marginally faster (these timings hold up over multiple runs). And as far as I understand it, the version of `plyr` on github is even faster than this.