user88911 user88911 - 3 months ago 21
R Question

How to integrate set of vector in multiple data.frame into one without duplication?

I have position index vector in data.frame objects, but in each data.frame object, the order of position index vector are very different. However, I want to integrate/ merge these data.frame object object in one common data.frame with very specific order and not allow to have duplication in it. Does anyone know any trick for doing this more easily? Can anyone propose possible approach how to accomplish this task?

data



v1 <- data.frame(
foo=c(1,2,3),
bar=c(1,2,2),
bleh=c(1,3,0))

v2 <- data.frame(
bar=c(1,2,3),
foo=c(1,2,0),
bleh=c(3,3,4))

v3 <- data.frame(
bleh=c(1,2,3,4),
foo=c(1,1,2,0),
bar=c(0,1,2,3))


initial output after integrating them:



initial_output <- data.frame(
foo=c(1,2,3,1,2,0,1,1,2,0),
bar=c(1,2,2,1,2,3,0,1,2,3),
bleh=c(1,3,0,3,3,4,1,2,3,4)
)


remove duplication



rmDuplicate_output <- data.frame(
foo=c(1,2,3,1,0,1,1),
bar=c(1,2,2,1,3,0,1),
bleh=c(1,3,0,3,4,1,2)
)


final desired output:



final_output <- data.frame(
foo=c(1,1,1,1,2,3,0),
bar=c(0,1,1,1,2,2,3),
bleh=c(1,1,2,3,3,0,4)
)


How can I get my final desired output easily? Is there any efficient way for doing this sort of manipulation for data.frame object? Thanks

Answer

We can use bind_rows from dplyr, remove the duplicates with distinct and arrange by 'bar'

library(dplyr)
bind_rows(v1, v2, v3) %>%
             distinct %>%
             arrange(bar)
#    foo bar bleh
#1   1   0    1
#2   1   1    1
#3   1   1    3
#4   1   1    2
#5   2   2    3
#6   3   2    0
#7   0   3    4
Comments