Blacksad Blacksad - 11 days ago 5
R Question

Data.table: operation with group-shifted data

Consider the folowing

data.table
:

DT <- data.table(year = c(2011,2012,2013,2011,2012,2013,2011,2012,2013),
level = c(137,137,137,136,136,136,135,135,135),
valueIn = c(13,30,56,11,25,60,8,27,51))


I would like have the following ouput:

DT <- data.table(year = c(2011,2012,2013,2011,2012,2013,2011,2012,2013),
level = c(137,137,137,136,136,136,135,135,135),
valueIn = c(13,30,56, 11,25,60, 8,27,51),
valueOut = c(12,27.5,58, 9.5,26,55.5, NA,NA,NA))


In other words, I want to calculate the operation
(valueIn[level] - valueIn[level-1]) / 2
, according to the
year
. For example, the first value is calculated like this:
(13+11)/2=12
.

For the moment, I do that with
for
loops, in which I create
data.table
's subsets for each
level
:

levelDtList <- list()
levels <- sort(DT$level, decreasing = FALSE)
for (this.level in levels) {
levelDt <- DT[level == this.level]
if (this.level == min(levels)) {
valueOut <- NA
} else {
levelM1Data <- levelDtList[[this.level - 1]]
valueOut <- (levelDt$valueIn + levelM1Data$valueIn) / 2
}
levelDt$valueOut <- valueOut
levelDtList[[this.level]] <- levelDt
}
datatable <- rbindlist(levelDtList)


This is ugly and quite slow, so I am looking for a better, faster,
data.table
-based solution.

Answer

Using the shift-function with type = 'lead' to get the next value, sum and divide by two:

DT[, valueOut := (valueIn + shift(valueIn, type = 'lead'))/2, by = year]

you get:

   year level valueIn valueOut
1: 2011   137      13     12.0
2: 2012   137      30     27.5
3: 2013   137      56     58.0
4: 2011   136      11      9.5
5: 2012   136      25     26.0
6: 2013   136      60     55.5
7: 2011   135       8       NA
8: 2012   135      27       NA
9: 2013   135      51       NA

With all the parameter of the shift specified:

DT[, valueOut := (valueIn + shift(valueIn, n = 1L, fill = NA, type = 'lead'))/2, by = year]
Comments