Gyan Veda Gyan Veda - 1 year ago 81
R Question

How to append rows to an R data frame

I have looked around StackOverflow, but I cannot find a solution specific to my problem, which involves appending rows to an R data frame.

I am initializing an empty 2-column data frame, as follows.

df = data.frame(x = numeric(), y = character())


Then, my goal is to iterate through a list of values and, in each iteration, append a value to the end of the list. I started with the following code.

for (i in 1:10) {
df$x = rbind(df$x, i)
df$y = rbind(df$y, toString(i))
}


I also attempted the functions
c
,
append
, and
merge
without success. Please let me know if you have any suggestions.

Answer Source

Update

Not knowing what you are trying to do, I'll share one more suggestion: Preallocate vectors of the type you want for each column, insert values into those vectors, and then, at the end, create your data.frame.

Continuing with Julian's f3 (a preallocated data.frame) as the fastest option so far, defined as:

# pre-allocate space
f3 <- function(n){
  df <- data.frame(x = numeric(n), y = character(n), stringsAsFactors = FALSE)
  for(i in 1:n){
    df$x[i] <- i
    df$y[i] <- toString(i)
  }
  df
}

Here's a similar approach, but one where the data.frame is created as the last step.

# Use preallocated vectors
f4 <- function(n) {
  x <- numeric(n)
  y <- character(n)
  for (i in 1:n) {
    x[i] <- i
    y[i] <- i
  }
  data.frame(x, y, stringsAsFactors=FALSE)
}

microbenchmark from the "microbenchmark" package will give us more comprehensive insight than system.time:

library(microbenchmark)
microbenchmark(f1(1000), f3(1000), f4(1000), times = 5)
# Unit: milliseconds
#      expr         min          lq      median         uq         max neval
#  f1(1000) 1024.539618 1029.693877 1045.972666 1055.25931 1112.769176     5
#  f3(1000)  149.417636  150.529011  150.827393  151.02230  160.637845     5
#  f4(1000)    7.872647    7.892395    7.901151    7.95077    8.049581     5

f1() (the approach below) is incredibly inefficient because of how often it calls data.frame and because growing objects that way is generally slow in R. f3() is much improved due to preallocation, but the data.frame structure itself might be part of the bottleneck here. f4() tries to bypass that bottleneck without compromising the approach you want to take.


Original answer

This is really not a good idea, but if you wanted to do it this way, I guess you can try:

for (i in 1:10) {
  df <- rbind(df, data.frame(x = i, y = toString(i)))
}

Note that in your code, there is one other problem:

  • You should use stringsAsFactors if you want the characters to not get converted to factors. Use: df = data.frame(x = numeric(), y = character(), stringsAsFactors = FALSE)
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download