Mike Mike - 1 month ago 10
R Question

Creating Subgroups based on Time Period using Lubridate and Dplyr

This should be a quick and easy question. Using the simple dataframe below, I want to use dplyr and lubridate to group all the clients who have an OnsetDate on or after April, 2015. This group would be called "NewOnset" and the remainder will be "OldOnset".

I'm new to lubridate and having some trouble.

City<-c("Toronto", "Toronto", "Montreal","Ottawa","Ottawa",
"Hamilton","Peterborough","Toronto","Hamilton","Hamilton")

OnsetDate<-c("11/04/1980","04/08/2005","04/19/2015","07/10/2015","10/10/1999","03/11/2016","09/12/2011","06/10/2015","02/05/1988","08/08/2016")

Client<-c("Cl1","Cl2","Cl3","Cl4","Cl5","Cl6","Cl7","Cl8","Cl9","Cl10")

DF<- data.frame(Client,City,OnsetDate)

Answer

With dplyr,

       # parse OnsetDate to Date; alternatively use lubridate::mdy(OnsetDate)
DF %>% mutate(OnsetDate = as.Date(OnsetDate, '%m/%d/%Y')) %>% 
    # add and group by new column
    group_by(group = if_else(OnsetDate >= as.Date('2015-04-01'),    # condition
                             'NewOnset',    # return if above (true)
                             'OldOnset'))   # return if below (false)

## Source: local data frame [10 x 4]
## Groups: group [2]
## 
##    Client         City  OnsetDate    group
##    <fctr>       <fctr>     <date>    <chr>
## 1     Cl1      Toronto 1980-11-04 OldOnset
## 2     Cl2      Toronto 2005-04-08 OldOnset
## 3     Cl3     Montreal 2015-04-19 NewOnset
## 4     Cl4       Ottawa 2015-07-10 NewOnset
## 5     Cl5       Ottawa 1999-10-10 OldOnset
## 6     Cl6     Hamilton 2016-03-11 NewOnset
## 7     Cl7 Peterborough 2011-09-12 OldOnset
## 8     Cl8      Toronto 2015-06-10 NewOnset
## 9     Cl9     Hamilton 1988-02-05 OldOnset
## 10   Cl10     Hamilton 2016-08-08 NewOnset

Note the grouping here doesn't do anything, and you could do both operations in mutate, but you do get a grouped data.frame appropriate for further mutation or summarization.

An alternative would be to use cut.Date, which will return a factor:

# parse OnsetDate to Date; alternatively use lubridate::mdy(OnsetDate)
DF %>% mutate(OnsetDate = as.Date(OnsetDate, '%m/%d/%Y')) %>% 
    # add and group by new column
    group_by(group = cut(OnsetDate, 
                         breaks = c(min(OnsetDate), as.Date('2015-04-01'), max(OnsetDate)), 
                         labels = c('OldOnset', 'NewOnset'), 
                         include.lowest = TRUE))

## Source: local data frame [10 x 4]
## Groups: group [2]
## 
##    Client         City  OnsetDate    group
##    <fctr>       <fctr>     <date>   <fctr>
## 1     Cl1      Toronto 1980-11-04 OldOnset
## 2     Cl2      Toronto 2005-04-08 OldOnset
## 3     Cl3     Montreal 2015-04-19 NewOnset
## 4     Cl4       Ottawa 2015-07-10 NewOnset
## 5     Cl5       Ottawa 1999-10-10 OldOnset
## 6     Cl6     Hamilton 2016-03-11 NewOnset
## 7     Cl7 Peterborough 2011-09-12 OldOnset
## 8     Cl8      Toronto 2015-06-10 NewOnset
## 9     Cl9     Hamilton 1988-02-05 OldOnset
## 10   Cl10     Hamilton 2016-08-08 NewOnset
Comments