Sairam Reddipalli Sairam Reddipalli - 3 months ago 12
R Question

Match and replace columns of dataframe by multiple conditions

Cheers,
I have two data frames with the following structure.

DF1:
Airlines HeadQ Date Cost_Index
American PHX 07-31-2016 220
American ATL 08-31-2016 150
American ATL 10-31-2016 150
Delta ATL 10-31-2016 180


Second data frame DF2 has the following structure:

DF2:
Airlines HeadQ Date
American ATL 09-30-2016
Delta ATL 03-31-2016


Now looking up with data frames DF1 and DF2, I would like to alter DF1 to the following data frame.

DF1:
Airlines HeadQ Date Cost_Index
American PHX 07-31-2016 220
American ATL 08-31-2016 0
American ATL 10-31-2016 150
Delta ATL 10-31-2016 180


The condition is, lookup for Airlines and HeadQ of DF1 from DF2 and if DF1$Date < DF2$Date then make Cost_Index as 0 or else continue with Cost_Index.

I tried, unsuccessfully, with:

DF1$Cost_Index <- ifelse(DF1$Airlines == DF2$Airlines & DF1$HeadQ == DF2$HeadQ
& DF1$Date < DF2$Date, 0, DF1$Cost_Index)


Can anyone point me to right direction?
Note:

str(DF1$Date): Date, format: "2016-10-31"
str(DF2$Date): Date, format: "2016-08-31"

Answer

Using the recently implemented conditional joins feature in the current development version of data.table, v1.9.7, I'd do this as follows:

require(data.table) # v1.9.7
# convert to data.tables, and Date column to Date class.
setDT(df1)[, Date := as.Date(Date, format = "%m-%d-%Y")]
setDT(df2)[, Date := as.Date(Date, format = "%m-%d-%Y")]

df1[df2, on = .(Airlines, HeadQ, Date < Date), # find matching rows based on condition
      Cost_Index := 0L]                        # update column with 0 for those rows

df1
#    Airlines HeadQ       Date Cost_Index
# 1: American   PHX 2016-07-31        220
# 2: American   ATL 2016-08-31          0
# 3: American   ATL 2016-10-31        150
# 4:    Delta   ATL 2016-10-31        180

You can install the dev version by following these instructions.