coip coip -4 years ago 91
R Question

Determine if a date in one column occurs after a date in another column, by group

Say I bill customers after a date of service and stop serving them if they haven't paid a bill. But the lag between a service date and the bill date makes it difficult to enforce when customers request additional service. To determine if the customers are delinquent, I would need to know if the date of the newly requested service occurs after an outstanding bill has been sent (which, could've been sent much later than the service date).

Sample Data

df <- structure(list(id = structure(c(1L, 2L, 3L, 4L, 1L, 1L, 2L, 3L, 2L, 2L), .Label = c("A", "B", "C", "D"), class = "factor"), service.date = structure(c(1L, 3L, 5L, 6L, 2L, 9L, 4L, 7L, 8L, 10L), .Label = c("2011-01-01", "2011-01-03", "2011-02-01", "2011-03-01", "2011-03-02", "2011-04-02", "2011-05-09", "2011-08-19", "2011-09-02", "2011-09-10"), class = "factor"), bill.date = structure(c(4L, 5L, 2L, 6L, 9L, 1L, 8L, 10L, 3L, 7L), .Label = c("2011-08-09", "2011-08-10", "2011-08-11", "2011-08-12", "2011-08-13", "2011-08-14", "2011-08-15", "2011-08-16", "2011-08-17", "2011-08-19"), class = "factor")), .Names = c("id", "service.date", "bill.date"), class = "data.frame", row.names = c(NA, -10L))

# df
# id service.date bill.date
# A 2011-01-01 2011-08-12
# B 2011-02-01 2011-08-13
# C 2011-03-02 2011-08-10
# D 2011-04-02 2011-08-14
# A 2011-01-03 2011-08-17
# A 2011-09-02 2011-08-09
# B 2011-03-01 2011-08-16
# C 2011-05-09 2011-08-19
# B 2011-08-19 2011-08-11
# B 2011-09-10 2011-08-15


So if they requested additional service before the bill was sent for their initial service, they would not be considered delinquent yet. But if they request additional service after a bill has been issued and remains unpaid, they'd be delinquent.

Steps So Far
My idea is to use a grouping function, perhaps like
by()
, to find the first "bill.date" associated with a level in the factor variable "id" and then determine, for each "service.date" associated with each "id" level, if it occurs after the associated outstanding "bill.date" for said "id" level, ultimately creating a logical variable. Here is a sample of what I'd like to end up with:

Desired Outcome

df$delinquent <- c(FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, TRUE, TRUE)

#df

# id service.date bill.date delinquent
# A 2011-01-01 2011-08-12 FALSE
# B 2011-02-01 2011-08-13 FALSE
# C 2011-03-02 2011-08-10 FALSE
# D 2011-04-02 2011-08-14 FALSE
# A 2011-01-03 2011-08-17 FALSE
# A 2011-09-02 2011-08-09 TRUE
# B 2011-03-01 2011-08-16 FALSE
# C 2011-05-09 2011-08-19 FALSE
# B 2011-08-19 2011-08-11 TRUE
# B 2011-09-10 2011-08-15 TRUE


So in the sample data, there are four 'customers' (named A, B, C, and D), and two of them would be marked delinquent (A and B) for getting service despite having an outstanding bill.

Answer Source
# Load some tidyverse libraries
require(dplyr)

# Convert factor dates to actual dates
df <- df %>% mutate(service.date = as.Date(service.date),
                    bill.date = as.Date(bill.date))

# If service date is later than earliest bill.date in each group, return delinquent
df %>% group_by(id) %>% mutate(delinquent = service.date > min(bill.date))
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download