Mehdi Farhangian Mehdi Farhangian - 4 months ago 10
R Question

Map two data-frames in R in condition that the time of one of them is earlier than the other

I wish to merge two data sets by their user ID. My problem is I had to filter those of a data set that happened after the other. A simple example is

# Dataset 1 (dts1)

User ID date Hour transactions
1 5 25/07/2016 02:32 4
2 6 24/07/2016 02:42 2
3 8 25/07/2016 02:52 3
4 9 24/07/2016 03:02 4
5 11 25/07/2016 03:12 1
6 13 26/07/2016 03:22 3


and

# Dataset 2 (dts2)

User ID date Hour Events
1 5 25/07/2016 02:31 8
2 5 26/07/2016 02:42 6
3 5 24/07/2016 07:52 9
4 14 24/07/2016 03:02 5
5 5 25/07/2016 09:12 10
6 4 26/07/2016 03:22 4


I wish only map those from data set2 which happened before dataset1. So, ideally my output will be like that

#output
User ID Events Events transactions
1 5 8 9 4

Answer

Given data dts1 and dts2, and assuming date and Hour are characters:

> dts1
  UserID       date  Hour transactions
1      5 25/07/2016 02:32            4
2      6 24/07/2016 02:42            2
3      8 25/07/2016 02:52            3
4      9 24/07/2016 03:02            4
5     11 25/07/2016 03:12            1
6     13 26/07/2016 03:22            3
> dts2
  UserID       date  Hour Events
1      5 25/07/2016 02:31      8
2      5 26/07/2016 02:42      6
3      5 24/07/2016 07:52      9
4     14 24/07/2016 03:02      5
5      5 25/07/2016 09:12     10
6      4 26/07/2016 03:22      4

The basic idea is to make the times in the two dataframes comparable. First we convert the date/hour in dts2 into POSIX class:

dts2$time <- strptime(paste(dts2$date, dts2$Hour), format="%d/%m/%Y %H:%M")

Then we use apply to iterate through dts1, finding rows from dts2 that match UserID and with the condition that the time is earlier than the time in dataset 1:

dts1$Events <- apply(dts1[,c("UserID","date","Hour")], MAR=1, function(x) {
    time1 <- strptime(paste(x[2], x[3]), format="%d/%m/%Y %H:%M")
    rows <- which(dts2$UserID==as.numeric(x[1]) & dts2$time<time1)
    if (length(rows)>0) {            
        dts2$Events[rows]
    } else {
        NA
    }
})

Result:

> dts1
  UserID       date  Hour transactions Events
1      5 25/07/2016 02:32            4   8, 9
2      6 24/07/2016 02:42            2     NA
3      8 25/07/2016 02:52            3     NA
4      9 24/07/2016 03:02            4     NA
5     11 25/07/2016 03:12            1     NA
6     13 26/07/2016 03:22            3     NA
Comments