coip -4 years ago 91
R Question

# Determine if a date in one column occurs after a date in another column, by group

Say I bill customers after a date of service and stop serving them if they haven't paid a bill. But the lag between a service date and the bill date makes it difficult to enforce when customers request additional service. To determine if the customers are delinquent, I would need to know if the date of the newly requested service occurs after an outstanding bill has been sent (which, could've been sent much later than the service date).

Sample Data

``````df <- structure(list(id = structure(c(1L, 2L, 3L, 4L, 1L, 1L, 2L, 3L, 2L, 2L), .Label = c("A", "B", "C", "D"), class = "factor"), service.date = structure(c(1L, 3L, 5L, 6L, 2L, 9L, 4L, 7L, 8L, 10L), .Label = c("2011-01-01", "2011-01-03", "2011-02-01", "2011-03-01", "2011-03-02", "2011-04-02", "2011-05-09", "2011-08-19", "2011-09-02", "2011-09-10"), class = "factor"), bill.date = structure(c(4L, 5L, 2L, 6L, 9L, 1L, 8L, 10L, 3L, 7L), .Label = c("2011-08-09", "2011-08-10", "2011-08-11", "2011-08-12", "2011-08-13", "2011-08-14", "2011-08-15", "2011-08-16", "2011-08-17", "2011-08-19"), class = "factor")), .Names = c("id", "service.date", "bill.date"), class = "data.frame", row.names = c(NA, -10L))

# df
# id  service.date     bill.date
# A   2011-01-01       2011-08-12
# B   2011-02-01       2011-08-13
# C   2011-03-02       2011-08-10
# D   2011-04-02       2011-08-14
# A   2011-01-03       2011-08-17
# A   2011-09-02       2011-08-09
# B   2011-03-01       2011-08-16
# C   2011-05-09       2011-08-19
# B   2011-08-19       2011-08-11
# B   2011-09-10       2011-08-15
``````

So if they requested additional service before the bill was sent for their initial service, they would not be considered delinquent yet. But if they request additional service after a bill has been issued and remains unpaid, they'd be delinquent.

Steps So Far
My idea is to use a grouping function, perhaps like
`by()`
, to find the first "bill.date" associated with a level in the factor variable "id" and then determine, for each "service.date" associated with each "id" level, if it occurs after the associated outstanding "bill.date" for said "id" level, ultimately creating a logical variable. Here is a sample of what I'd like to end up with:

Desired Outcome

``````df\$delinquent <- c(FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, TRUE, TRUE)

#df

# id    service.date    bill.date   delinquent
# A     2011-01-01      2011-08-12   FALSE
# B     2011-02-01      2011-08-13   FALSE
# C     2011-03-02      2011-08-10   FALSE
# D     2011-04-02      2011-08-14   FALSE
# A     2011-01-03      2011-08-17   FALSE
# A     2011-09-02      2011-08-09   TRUE
# B     2011-03-01      2011-08-16   FALSE
# C     2011-05-09      2011-08-19   FALSE
# B     2011-08-19      2011-08-11   TRUE
# B     2011-09-10      2011-08-15   TRUE
``````

So in the sample data, there are four 'customers' (named A, B, C, and D), and two of them would be marked delinquent (A and B) for getting service despite having an outstanding bill.

``````# Load some tidyverse libraries