Shuklaswag Shuklaswag - 1 year ago 71
R Question

Copying over the values of a column, based on the values of another column

I have a dataset, "data" where every few adjacent rows has the same "id", a parameter in the dataset. I have another dataset, "otherData", with the same id values, but only one row per id value. The order of the id's are the same in both datasets.

I want to copy a column "newColumn" from otherData to data, mapping it based on the id value. This is what I currently have:

sapply(otherData$id, function(id)
data$newColumn[data$id == id, ] <- otherData[otherData$id == id, ]$newColumn

However, sapply is disgustingly slow, since it iterates over each unique id value and has to find every row with that value in the data.frame.

Is there a faster alternative? I thought that perhaps I may be able to take advantage of the fact that all the rows with the same "id" value are adjacent to each other.

Answer Source

Simply merge the two dataframes but filter columns in otherData to join id key and newColumn:

newData <- merge(data, otherData[c("id", "newColumn")], by=c("id"))