DataTx DataTx - 22 days ago 12
R Question

Filter a Dataframe by Another Dataframe

Supposedly this question has already been answered. But the user who flagged my question failed to test the solution and the cited question does not work for my problem.

I have found questions on how to filter a dataframe using another list but I have not found something that shows how to filter a dataframe using another dataframe.

I have two dataframes and the first one can be thought of as a key of ID's and dates.

id date
1 id1 2016-06-23
2 id2 2016-06-25
3 id3 2016-06-23
4 id4 2016-06-25
5 id5 2016-06-27

structure(list(id = structure(1:5, .Label = c("id1", "id2", "id3",
"id4", "id5"), class = "factor"), date = structure(c(16975, 16977,
16975, 16977, 16979), class = "Date")), .Names = c("id", "date"
), row.names = c(NA, -5L), class = "data.frame")


I then have a second dataframe with ID's and dates and I would like to filter the second dataframe to only return rows that are after the date for the ID in the first row.

Here is the second dataframe:

id date
1 id1 2016-06-20
2 id1 2016-06-23
3 id1 2016-06-24
4 id2 2016-06-23
5 id3 2016-06-27

structure(list(id = structure(c(1L, 1L, 1L, 2L, 3L), .Label = c("id1",
"id2", "id3"), class = "factor"), date = structure(c(16972, 16975,
16976, 16975, 16979), class = "Date")), .Names = c("id", "date"
), row.names = c(NA, -5L), class = "data.frame")


And this is what the results would look like:

id date
1 id1 2016-06-24
2 id3 2016-06-27

Answer Source

Solution using data.table:

library(data.table)
setDT(d1)
setDT(d2)
merge(d1, d2, "id")[date.y > date.x, .(id, date = date.y)]

    id       date
1: id1 2016-06-24
2: id3 2016-06-27