d84_n1nj4 d84_n1nj4 - 2 months ago 11
R Question

How to rbind new rows from one data frame to an existing data frame in R

I would like to know how to append new data (rows) from one data frame, df2, to an existing data frame, df1, based on a unique value in each table. So I have an existing data frame, df1, that has historical data and each row has a unique value. I then pull data from the web and put it into a new data frame, df2. The new data frame also includes a unique value which may or may not match a unique value in df1.

I would like to take all rows in df2 that have a unique value that does not exist in df1, and append those rows to df1. My initial thoughts were to use code similar to this:

ifelse(any(df1$unique_val==df2$unique_val), df1 <- df1, df1 <- rbind(df2, df1))


But then I realized that I need a more one-to-one match than an "any" match. I know how I would do this in SQL with a UNION and WHERE clause, but I'm not sure how to make it work in R. The only related items I could find researching were appending all data from two data frames or appending a new column to an existing data frame.

The following example shows what I am looking for and why I am not looking to "join" these two data frames"

df1 = data.frame(numb = c(1:6), rand = c(rep("Toaster",6)))


df1$unique_val <- paste0(df1$numb, df1$rand)


> df1
numb rand unique_val
1 1 Toaster 1Toaster
2 2 Toaster 2Toaster
3 3 Toaster 3Toaster
4 4 Toaster 4Toaster
5 5 Toaster 5Toaster
6 6 Toaster 6Toaster


df2 = data.frame(numb = c(5:7), rand = c(rep("Toaster",2), c(rep("Radio",1))))


df2$unique_val <- paste0(df2$numb, df2$rand)


> df2
numb rand unique_val
1 5 Toaster 5Toaster
2 6 Toaster 6Toaster
3 7 Radio 7Radio


As you can see, row 3 in df2 is the only new row (a row that does not have a matching unique_val in df1). I would like to add this new row to df1. Note: it's not always the same row that is new in df2.

I used each of the joins from this post, merge/join data frames as follows:

merge(df1,df2, by = "unique_val")


merge(df1,df2, by = "unique_val", all = TRUE)


merge(df1,df2, by = "unique_val", all.x = TRUE)


merge(df1,df2, by = "unique_val", all.y = TRUE)


I also tried the anti_join from dplyr:

anti_join(df1,df2, by = "unique_val")


Rbind gives me the following:

rbind(df1,df2)
numb rand conc
1 1 Toaster 1Toaster
2 2 Toaster 2Toaster
3 3 Toaster 3Toaster
4 4 Toaster 4Toaster
5 5 Toaster 5Toaster
6 6 Toaster 6Toaster
7 5 Toaster 5Toaster
8 6 Toaster 6Toaster
9 7 Radio 7Radio


None of which give me the desired output of the following:


numb rand conc
1 1 Toaster 1Toaster
2 2 Toaster 2Toaster
3 3 Toaster 3Toaster
4 4 Toaster 4Toaster
5 5 Toaster 5Toaster
6 6 Toaster 6Toaster
7 7 Radio 7Radio


I'm looking to rbind these data frames, not join them.

Answer

We can use rbindlist/unique from data.table. We place the datasets in a list, use rbindlist (from data.table) to rbind the datasets in the list to a single data.table and get the unique rows with unique from data.table which also has the by option to specify the variable.

library(data.table)
unique(rbindlist(list(df1, df2)), by = "numb")
#   numb    rand unique_val
#1:    1 Toaster   1Toaster
#2:    2 Toaster   2Toaster
#3:    3 Toaster   3Toaster
#4:    4 Toaster   4Toaster
#5:    5 Toaster   5Toaster
#6:    6 Toaster   6Toaster
#7:    7   Radio     7Radio
Comments