Veraaa Veraaa - 3 months ago 11
R Question

Merge data in r based on conditions and creating a new column

I'm a beginner with R and I want to merge two datasets together, based on the ID column. If the ID number of dataset 2 exists in dataset 1, then I want the data for this particular row from dataset 2 to be added to the row in dataset 1.

Secondly, if there is a match, I want to put a "1" in a new column called match for that specific row/ match and "0" if there wasn't a match.

Example:

Dataset 1:

Id category
123 3
124 1
125 2

Dataset 2:

Id score category
123 0.24 3
124 0.83 1
126 0.92 2


Final example with the added column:

Id score category match
123 0.24 3 1
124 0.83 1 1
125 NA 1 0
126 0.92 2 1


I've tried this so far (and some other combinations), but that did not give me good results:

data <- merge(df1, df2, by ="ID" , all.x = TRUE)


I would really appreciate any help!

Reproducible code:

df1 <- data.frame(ID=c("123","124","125"), category=c(3,1,2)
df2 <- data.frame(ID=c("123","124","126"), score=("0.24","0.83","0.92"), category=c("3","1","2")

Answer

You are almost there. I have slightly modified your example.

df1 <- data.frame(ID = c(123, 124, 125),
                  category=c(3, 1, 2))
df1

df2 <- data.frame(ID = c(123, 124, NA),
                  score = c(0.24, 0.83, 0.35),
                  category = c(3, 1, 2))
df2

df2$match <- 1
df2

data <- merge(df1, df2, by = c("ID", "category") , all.x = TRUE)
data$match[is.na(data$match)] <- 0
data
Comments