Sairam Reddy Sairam Reddy - 3 months ago 11
R Question

Matching and Replacing values of a column in a dataframe by date in r

Cheers,
I have two data frames with the following structure.

DF1:
Airlines HeadQ Date Cost_Index
American PHX 07-31-2016 220
American ATL 08-31-2016 150
American ATL 10-31-2016 150
Delta ATL 10-31-2016 180
American ATL 08-31-2017 200


Second data frame DF2 has the following structure:

DF2:
Airlines HeadQ Date
American ATL 09-30-2016
Delta ATL 03-31-2017


Now looking up with data frames DF1 and DF2, I would like to alter DF1 to the following data frame.

DF1:
Airlines HeadQ Date Cost_Index
American PHX 07-31-2016 220
American ATL 08-31-2016 0
American ATL 10-31-2016 150
Delta ATL 10-31-2016 180
American ATL 08-31-2017 200


The condition is, lookup for Airlines and HeadQ of DF1 from DF2 and if DF1$Date < DF2$Date then make Cost_Index as 0 or else continue with Cost_Index.

I tried, unsuccessfully, with:

DF1$Cost_Index <- ifelse(DF1$Airlines == DF2$Airlines & DF1$HeadQ == DF2$HeadQ
& DF1$Date < DF2$Date, 0, DF1$Cost_Index)


Warning:
1: In DF1$Airlines == DF2$Airlines : longer object
length is not a multiple of shorter object length".
2: In<=.default(DF1$Date, DF2$Date) : longer object length is not a
multiple of shorter object length

DF1:
Airlines HeadQ Date Cost_Index
American PHX 07-31-2016 220
American ATL 08-31-2016 0
American ATL 10-31-2016 0
Delta ATL 10-31-2016 0
American ATL 08-31-2017 200


Can anyone point me to right direction?

Note:

str(DF1$Date): Date, format: "2016-10-31"
str(DF2$Date): Date, format: "2016-08-31"

Answer

You can do a left join of DF1 and DF2 on the Airlines and HeadQ columns, and then change the value of Cost_Index based on if Date.x(Date from DF1) is smaller than Date.y(Date from DF2). With dplyr syntax (an additional check of if Date.y is NA is applied here according to your logic that if there is no match in DF2, the Cost_Index should remain the same):

library(dplyr)

# convert the Date column to Date class for comparison
df1$Date <- as.Date(df1$Date, "%m-%d-%Y")
df2$Date <- as.Date(df2$Date, "%m-%d-%Y")

df1 %>% 
       left_join(df2, by = c("Airlines", "HeadQ")) %>% 
       mutate(Cost_Index = ifelse(Date.x < Date.y & !is.na(Date.y), 0, Cost_Index), 
              Date = Date.x) %>% 
       select(-Date.x, -Date.y)

#   Airlines HeadQ Cost_Index       Date
# 1 American   PHX        220 2016-07-31
# 2 American   ATL          0 2016-08-31
# 3 American   ATL        150 2016-10-31
# 4    Delta   ATL        180 2016-10-31
Comments