aiolias aiolias - 1 month ago 9
R Question

Cohort analysis table with relative weeks

I have the table similar to this:

dt <- data.table(id=c("123","123","234","234","345","345","345","456","456"),
treatment=c("control","control","variable","variable","control","control","control","control","control"),
cohort=c("2015-08-10","2015-08-10","2015-08-10","2015-08-10","2015-08-17","2015-08-17","2015-08-17","2015-08-17","2015-08-17"),
visit_date=c("2015-08-10", "2015-08-11","2015-08-12", "2015-08-18","2015-08-19","2015-08-31","2015-09-01","2015-08-19","2015-08-27"),
visit_week=c("2015-08-10", "2015-08-10","2015-08-10", "2015-08-17","2015-08-17","2015-08-31","2015-08-31","2015-08-17","2015-08-24"))
> dt
id treatment cohort visit_date visit_week
1: 123 control 2015-08-10 2015-08-10 2015-08-10
2: 123 control 2015-08-10 2015-08-11 2015-08-10
3: 234 variable 2015-08-10 2015-08-12 2015-08-10
4: 234 variable 2015-08-10 2015-08-18 2015-08-17
5: 345 control 2015-08-17 2015-08-19 2015-08-17
6: 345 control 2015-08-17 2015-08-31 2015-08-31
7: 345 control 2015-08-17 2015-09-01 2015-08-31
8: 456 control 2015-08-17 2015-08-19 2015-08-17
9: 456 control 2015-08-17 2015-08-27 2015-08-24


I'm trying to output something like this:

cohort treatment visit_week_1 visit_week_2 visit_week_3
1: 2015-08-10 control 1 0 0
2: 2015-08-10 variable 1 1 0
3: 2015-08-17 control 2 1 1


I tried using dcast, but there's something wrong with my command because the counts are off:

> dcast(dt, cohort+treatment ~ paste0("visit_week_", dt[, seq_len(.N), by=id]$V1), value.var="visit_week", function(x) length(unique(x)))
cohort treatment visit_week_1 visit_week_2 visit_week_3
1 2015-08-10 control 1 1 0
2 2015-08-10 variable 1 1 0
3 2015-08-17 control 1 2 1


Additional note for clarification: Each visit week needs to be relative to each cohort. So
visit_week
1:3 for cohort
2015-08-10
would be "2015-08-10","2015-08-17","2015-08-24". For cohort
2015-08-17
weeks 1:3 would be "2015-08-17","2015-08-24","2015-08-31"

Answer

You can use dplyr and tidyr:

library(dplyr)
library(tidyr)
dt %>% group_by(cohort, treatment, visit_week) %>%
       summarise(visits = n()) %>%
       mutate(week = paste0("visit_week_", as.numeric(as.factor(visit_week)))) %>%
       dplyr::select(-visit_week) %>%
       spread(week, visits, fill = 0)

Source: local data table [3 x 5]
Groups: 

      cohort treatment visit_week_1 visit_week_2 visit_week_3
       (chr)     (chr)        (dbl)        (dbl)        (dbl)
1 2015-08-10   control            2            0            0
2 2015-08-10  variable            1            1            0
3 2015-08-17   control            2            1            2