Milosz Milosz - 2 days ago 5
R Question

How to get the maximum value by id over the last 4 quarters in r data.table?

I have a dataset with id's and states. I need to get the maximum status over the last 4 quarters. As an example, I created this table:

library(data.table)
library(zoo)

DT =
data.table(id=c("1","1","1","1","1","1","1","1","2"),
date=as.yearqtr(c("2015Q1", "2014Q4", "2014Q3", "2014Q2", "2014Q1", "2013Q4","2013Q3","2013Q2","2015Q2")),
status=c(0,0,0,0,0,0,1,0,0))


I want to get the maximum value for
status
over the last 4 quarters available:

DT =
data.table(id=c("1","1","1","1","1","1","1","1","2"),
date=as.yearqtr(c("2015Q1", "2014Q4", "2014Q3", "2014Q2", "2014Q1","2013Q4","2013Q3","2013Q2","2015Q2")),
status=c(0,0,0,0,0,0,1,0,0),
Max_status=c(0,0,0,1,1,1,1,0,0))


Does anyone have any clue to do that neatly and quickly in data.table? I tried something like this:

DT[,Date_1yPrior:=date-1]
DT[,Max_Status:=max(status[date>=Date_1yPrior & date<=Date_1yPrior]),by='id']


but the j and i refers are not correct. I would prefer a
data.table
solution.

Thank you in advance!

Answer

I think this does the trick:

library(zoo)
library(data.table)
DT[, date_num := as.numeric(date)]
DT[, NextStatus1Q := .SD[match(date_num - .25, .SD$date_num), status], by=id]
DT[, NextStatus2Q := .SD[match(date_num - .5, .SD$date_num), status], by=id]
DT[, NextStatus3Q := .SD[match(date_num - .75, .SD$date_num), status], by=id]
DT[, NextStatus4Q := .SD[match(date_num - 1, .SD$date_num), status], by=id]
DT[,MaxStatus:=max(NextStatus1Q,NextStatus2Q,NextStatus3Q,status,0,na.rm=T),by=c('id','date')]
#DT[,MaxStatus:=max(NextStatus1Q,NextStatus2Q,NextStatus3Q,NextStatus4Q,status,0,na.rm=T),by=c('id','date')] # If you also want 4q (actually 5 quarters back)

Create new columns where you lookup the value 1-3 quarters ago and after that select the max.

Comments