Ben Ben - 3 months ago 8
R Question

Update by reference with rolling join

How do I update a

data.table
by reference using a rolling join?

Example



Here I build a table of commercials and a table of sales.

# commercials
commercials<-data.table(commercialID=c("C1","C2","C3","C4"), commercialDate=as.Date(c("2014-1-1","2014-4-1","2014-7-1","2014-9-15")))
commercials[, rollDate:=commercialDate] #Add a column, rollDate equal to commercialDate
setkey(commercials, "rollDate")

commercials
commercialID commercialDate rollDate
1: C1 2014-01-01 2014-01-01
2: C2 2014-04-01 2014-04-01
3: C3 2014-07-01 2014-07-01
4: C4 2014-09-15 2014-09-15


# sales (A sale before all commercials, a sale after commercial1 and a sale after commercial2)
sales <- data.table(saleID=c("S0", "S1", "S2"), saleDate=as.Date(c("2010-12-31", "2014-2-1", "2014-5-1")))
sales[, rollDate:=saleDate]
setkey(sales, "rollDate")

sales
saleID saleDate rollDate
1: S0 2010-12-31 2010-12-31
2: S1 2014-02-01 2014-02-01
3: S2 2014-05-01 2014-05-01


I'd like to associate each commercial with the sale immediately before it and the sale immediately after it (inclusive of a sale on the same date as the commercial).




Attemp1

commercials[sales, roll=-Inf, `:=`(saleIDBefore=saleID, saleDateBefore=saleDate)]
commercials[sales, roll=Inf, `:=`(saleIDAfter=saleID, saleDateAfter=saleDate)]

commercials
commercialID commercialDate rollDate saleIDBefore saleDateBefore saleIDAfter saleDateAfter
1: C1 2014-01-01 2014-01-01 S0 2010-12-31 S1 2014-02-01
2: C2 2014-04-01 2014-04-01 S1 2014-02-01 S2 2014-05-01
3: C3 2014-07-01 2014-07-01 S2 2014-05-01 NA <NA>
4: C4 2014-09-15 2014-09-15 NA <NA> NA <NA>


FAIL. Notice the NA under saleIDBefore.




Correct (but suboptimal) solution

commercials <- sales[commercials, roll=Inf]
setnames(commercials, c("saleID", "saleDate"), c("saleIDBefore", "saleDateBefore"))
commercials <- sales[commercials, roll=-Inf]
setnames(commercials, c("saleID", "saleDate"), c("saleIDAfter", "saleDateAfter"))

commercials
saleIDAfter saleDateAfter rollDate saleIDBefore saleDateBefore commercialID commercialDate
1: S1 2014-02-01 2014-01-01 S0 2010-12-31 C1 2014-01-01
2: S2 2014-05-01 2014-04-01 S1 2014-02-01 C2 2014-04-01
3: NA <NA> 2014-07-01 S2 2014-05-01 C3 2014-07-01
4: NA <NA> 2014-09-15 S2 2014-05-01 C4 2014-09-15


But I'd like to do this using update by reference
:=
if it's possible.

Answer

Use non-equi joins (requires 1.9.7+):

commercials[sales, on = .(rollDate >= rollDate),
              `:=`(saleIDBefore=saleID, saleDateBefore=saleDate)][
            sales, on = .(rollDate <= rollDate), mult = 'last',
              `:=`(saleIDAfter=saleID, saleDateAfter=saleDate)][]
#   commercialID commercialDate   rollDate saleIDBefore saleDateBefore saleIDAfter saleDateAfter
#1:           C1     2014-01-01 2014-01-01           S0     2010-12-31          S1    2014-02-01
#2:           C2     2014-04-01 2014-04-01           S1     2014-02-01          S2    2014-05-01
#3:           C3     2014-07-01 2014-07-01           S2     2014-05-01          NA          <NA>
#4:           C4     2014-09-15 2014-09-15           S2     2014-05-01          NA          <NA>
Comments