gibbz00 gibbz00 - 29 days ago 6
R Question

Dynamic and conditional inserting of new rows that come after a certain Date

Following are what my data frames look like(Thanks to nice edits from experienced community):

library(data.table)
df <- fread('Account Date Blue Red Amount
A 1/1/2016 1 0 100
A 2/1/2016 1 1 200
B 1/10/2016 0 1 300
B 2/10/2016 1 1 400')
df[, Date := as.Date(Date, format="%m/%d/%Y")]

blue <- fread('Date Amount
6/1/2015 55
1/31/2016 55
2/28/2016 65
3/31/2016 75')
blue[, Date := as.Date(Date, format="%m/%d/%Y")]

red <- fread('Date Amount
12/31/2015 43
1/15/2016 47
2/15/2016 67
3/15/2016 77')
red[, Date := as.Date(Date, format="%m/%d/%Y")]


In the primary dataframe df, Blue and Red fields depict which category an Account belongs to at a given point in time.For example, as of 1/1/2016, Account A only belong to Blue category. blue and red dataframes depict at which dates cash was given out to all accounts in Blue and Red Category. I want to insert new rows in the original df with only the rows that come after the Date field in df from blue and red dataframes based on whether the account belongs to Blue or Red or both.

The output I am looking for looks like this:

Account Date Blue Red Amount
A 1/1/2016 1 0 100
A 1/31/2016 1 0 55
A 2/1/2016 1 1 200
A 2/15/2016 1 1 67
A 2/28/2016 1 1 65
A 3/15/2016 1 1 77
A 3/31/2016 1 1 75
B ..............................


In the output, as of 1/1/2016 Account A only belongs to Blue category. My goal is find the date immediately AFTER 1/1/2016 in the blue table which is 1/31/2016 and then insert it. I don't want to insert 1/15/2016 from the red table because account A is not category Red as of 1/1/2016. I am ok with Blue and Red fields showing up NA for inserted fields.

My thought was trying
rbind(df, blue, red), by="Account")
but dont know how to incorporate the conditions of only inserting later dates based on which category an Account belongs to at a given point in time.

Answer

A possible approach:

# combine the 'blue' & 'red' into one and create an 'colcat' column on the fly
br <- rbindlist(list(blue, red), 
                idcol = 'colcat')[, colcat := c('blue','red')[colcat]]

# loop over the rows of 'df', select the needed rows from 'bluered' 
# and punt the result into a list
brlist <- lapply(df$Date, function(x) br[Date > x][order(Date)])

# loop over the rows, select the needed rows from 'bluered' & bind them together
lst <- lapply(1:nrow(df), function(i) {
  idx <- c('blue','red')[c(c(1)[!!df[i][['Blue']]], c(2)[!!df[i][['Red']]])]
  incs <- brlist[[i]][colcat %in% idx][, .SD[1], colcat][, .(Account = df$Account[i], Date, Blue = df$Blue[i], Red = df$Red[i], Amount)]
  rbind(df[i],incs)
})

# bind the resulting list into one 'data.table' again
DT <- rbindlist(lst)

which gives:

> DT
    Account       Date Blue Red Amount
 1:       A 2016-01-01    1   0    100
 2:       A 2016-01-31    1   0     55
 3:       A 2016-02-01    1   1    200
 4:       A 2016-02-15    1   1     67
 5:       A 2016-02-28    1   1     65
 6:       B 2016-01-10    0   1    300
 7:       B 2016-01-15    0   1     47
 8:       B 2016-02-10    1   1    400
 9:       B 2016-02-15    1   1     67
10:       B 2016-02-28    1   1     65
Comments