Sam Thomas Sam Thomas - 3 months ago 18
R Question

Create multiple lead variables in data.table

This question is similar to Creating a bunch of lagged variables in data.table at once and R: How to create a lag variable for each by group?, but not quite the same as far as I can tell.

I would like to create several lead variables, e.g.

lead1
,
lead2
, and
lead3
below, grouped by
groups
.

Example data

require(data.table)
set.seed(1)
data <- data.table(time =c(1:10,1:8),groups = c(rep(c("a","b"),c(10,8))),
value = rnorm(18))
data
time groups value
1: 1 a -0.62645381
2: 2 a 0.18364332
3: 3 a -0.83562861
4: 4 a 1.59528080
5: 5 a 0.32950777
6: 6 a -0.82046838
7: 7 a 0.48742905
8: 8 a 0.73832471
9: 9 a 0.57578135
10: 10 a -0.30538839
11: 1 b 1.51178117
12: 2 b 0.38984324
13: 3 b -0.62124058
14: 4 b -2.21469989
15: 5 b 1.12493092
16: 6 b -0.04493361
17: 7 b -0.01619026
18: 8 b 0.94383621


The resulting data table should be

time groups value lead1 lead2 lead3
1 1 a -0.62645381 0.18364332 -0.83562861 1.59528080
2 2 a 0.18364332 -0.83562861 1.59528080 0.32950777
3 3 a -0.83562861 1.59528080 0.32950777 -0.82046838
4 4 a 1.59528080 0.32950777 -0.82046838 0.48742905
5 5 a 0.32950777 -0.82046838 0.48742905 0.73832471
6 6 a -0.82046838 0.48742905 0.73832471 0.57578135
7 7 a 0.48742905 0.73832471 0.57578135 -0.30538839
8 8 a 0.73832471 0.57578135 -0.30538839 NA
9 9 a 0.57578135 -0.30538839 NA NA
10 10 a -0.30538839 NA NA NA
11 1 b 1.51178117 0.38984324 -0.62124058 -2.21469989
12 2 b 0.38984324 -0.62124058 -2.21469989 1.12493092
13 3 b -0.62124058 -2.21469989 1.12493092 -0.04493361
14 4 b -2.21469989 1.12493092 -0.04493361 -0.01619026
15 5 b 1.12493092 -0.04493361 -0.01619026 0.94383621
16 6 b -0.04493361 -0.01619026 0.94383621 NA
17 7 b -0.01619026 0.94383621 NA NA
18 8 b 0.94383621 NA NA NA


Note that my actual data set is much larger and I may need more than 3 lead variables.

I am using
data.table
version 1.9.4 and am not sure when I will be able to update to the latest version, so a solution in this version would be a bonus. Sorry for this additional constraint.

Thanks in advance.

Answer

The function below will create lead columns for all lead values specified in the leads argument, which is just a vector of positive integers.

library(data.table)

lead.n = function(leads, values) {
  as.data.frame(sapply(leads, function(n) {
    lead.vals = c(values[-c(1:n)], rep(NA,n))
  }
  ))
}

data[, paste0("lead",1:3):=lead.n(1:3,value), by=groups]

    time groups       value       lead1       lead2       lead3
 1:    1      a -0.62645381  0.18364332 -0.83562861  1.59528080
 2:    2      a  0.18364332 -0.83562861  1.59528080  0.32950777
 3:    3      a -0.83562861  1.59528080  0.32950777 -0.82046838
 4:    4      a  1.59528080  0.32950777 -0.82046838  0.48742905
 5:    5      a  0.32950777 -0.82046838  0.48742905  0.73832471
 6:    6      a -0.82046838  0.48742905  0.73832471  0.57578135
 7:    7      a  0.48742905  0.73832471  0.57578135 -0.30538839
 8:    8      a  0.73832471  0.57578135 -0.30538839          NA
 9:    9      a  0.57578135 -0.30538839          NA          NA
10:   10      a -0.30538839          NA          NA          NA
11:    1      b  1.51178117  0.38984324 -0.62124058 -2.21469989
12:    2      b  0.38984324 -0.62124058 -2.21469989  1.12493092
13:    3      b -0.62124058 -2.21469989  1.12493092 -0.04493361
14:    4      b -2.21469989  1.12493092 -0.04493361 -0.01619026
15:    5      b  1.12493092 -0.04493361 -0.01619026  0.94383621
16:    6      b -0.04493361 -0.01619026  0.94383621          NA
17:    7      b -0.01619026  0.94383621          NA          NA
18:    8      b  0.94383621          NA          NA          NA
Comments