Andy.Jian Andy.Jian - 27 days ago 17
R Question

How to conditionally combine data.frame object in the list in more elegant way?

I have data.frame in the list, and I intend to merge specific data.frame objects conditionally where merge second, third data.frame objects without duplication, then merge it with first data.frame objects. However, I used rbind function to do this task, but my approach is not elegant. Can anyone help me out the improve the solution ? How can I achieve more compatible solution that can be used in dynamic functional programming ? How can I get desired output ? Any idea ?

reproducible example:

dfList <- list(
DF.1 = data.frame(red=c(1,2,3), blue=c(NA,1,2), green=c(1,1,2)),
DF.2 = data.frame(red=c(2,3,NA), blue=c(1,2,3), green=c(1,2,4)),
DF.3 = data.frame(red=c(2,3,NA,NA), blue=c(1,2,NA,3), green=c(1,2,3,4))
)


dummy way to do it:

rbind(dfList[[1L]], unique(rbind(dfList[[2L]], dfList[[3L]])))


Apparently, my attempt is not elegant to apply in functional programming. How can make this happen elegantly ?

desired output :

red blue green
1 1 NA 1
2 2 1 1
3 3 2 2
11 2 1 1
21 3 2 2
31 NA 3 4
6 NA NA 3


How can I improve my solution more elegantly and efficiently ? Thanks in advance

Answer

The best (easiest and fastest way) to do this is data.table::rbindlist.

It would work like this:

library(data.table)
dfList <- list(
  DF.1 = data.table(red=c(1,2,3), blue=c(NA,1,2), green=c(1,1,2)),
  DF.2 = data.table(red=c(2,3,NA), blue=c(1,2,3), green=c(1,2,4)),
  DF.3 = data.table(red=c(2,3,NA,NA), blue=c(1,2,NA,3), green=c(1,2,3,4))
)

# part 1: list element 1
dt_1 <- dfList[[1]]

# part 2: all other list elements (in your case 2 and 3)
dt_2 <- unique(rbindlist(dfList[-1]))

# use rbindlist to bind the rows together
dt_all <- rbindlist(list(dt_1, dt_2))

Comment.

My solution is pretty close to your proposed solution. I think the "ugliness" about this way is that it is an edge case to merge datasets and deattach the first element (and treat it in a different way). The best solution would probably be to step back and think about the underlying idea and solve it using an additional variable in the datasets (i.e., for df1 and then for df2_3), which I would consider the R-way.

Something along this thought would look like this:

myList2 <- list(
  DF.1 = data.table(red=c(1,2,3), blue=c(NA,1,2), green=c(1,1,2), var = "df1"),
  DF.2 = data.table(red=c(2,3,NA), blue=c(1,2,3), green=c(1,2,4), var = "other"),
  DF.3 = data.table(red=c(2,3,NA,NA), blue=c(1,2,NA,3), green=c(1,2,3,4), var = "other")
)

dt <- rbindlist(myList2)
unique(dt)
# red blue green   var
# 1:   1   NA     1   df1
# 2:   2    1     1   df1
# 3:   3    2     2   df1
# 4:   2    1     1 other
# 5:   3    2     2 other
# 6:  NA    3     4 other
# 7:  NA   NA     3 other