sriya sriya - 1 month ago 10
R Question

How to get average of lag period in a frequency as a new variable in r

I am trying to arrange my panel data set to create new variable using averages of regular occurring lag periods. My example data set look like below and can be accessed using below codes

Time ID Value1 Value2
1 Jan-14 A 12 NA
2 Feb-14 A 14 NA
3 Mar-14 A 15 NA
4 Apr-14 A 18 NA
5 May-14 A 10 NA
6 Jun-14 A 12 13.67
7 Jul-14 A 13 15.67
8 Aug-14 A 14 14.33
9 Jan-14 B 32 NA
10 Feb-14 B 14 NA
11 Mar-14 B 15 NA
12 Apr-14 B 18 NA
13 May-14 B 20 NA
14 Jun-14 B 12 20.33
15 Jul-14 B 13 15.67
16 Aug-14 B 14 17.78

df<-structure(list(Time = structure(c(4L, 3L, 7L, 1L, 8L, 6L, 5L,
2L, 4L, 3L, 7L, 1L, 8L, 6L, 5L, 2L), .Label = c("Apr-14", "Aug-14",
"Feb-14", "Jan-14", "Jul-14", "Jun-14", "Mar-14", "May-14"), class = "factor"),
ID = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A", "B"), class = "factor"),
Value1 = c(12L, 14L, 15L, 18L, 10L, 12L, 13L, 14L, 32L, 14L,
15L, 18L, 20L, 12L, 13L, 14L)), .Names = c("Time", "ID",
"Value1"), class = "data.frame", row.names = c(NA, -16L))


I want to create new variable called Value2 considering the average of 3 months corresponding to the previous 4th-6th month for each observation (this is just for the example, my monthly data set goes for 12 years across 40 individuals, and also I will have to consider a higher lag than 3 months, perhaps 14 months). Here I need to consider the ID as well because need to retain the panel structure. For all the observations which do not have complete 3 months previously, should indicate as
NA
. For example
Jun_14_A
should get the average from
Mar_14_A
to
Jan_14_A
. (12+14+15)/3 =13.67

I referred some examples here but those do not provide quite what I am looking for specially lag averages and indicating as NA s when there are no previous 3 complete months.

Thank you in advance for any help

Answer

Here's another way using just data.table and it's shift function combined with Reduce (This is almost a dupe of this)

library(data.table)
setDT(df)[, Value2 := Reduce(`+`, shift(Value1, 3:5))/3, by = ID]
df
#       Time ID Value1   Value2
#  1: Jan-14  A     12       NA
#  2: Feb-14  A     14       NA
#  3: Mar-14  A     15       NA
#  4: Apr-14  A     18       NA
#  5: May-14  A     10       NA
#  6: Jun-14  A     12 13.66667
#  7: Jul-14  A     13 15.66667
#  8: Aug-14  A     14 14.33333
#  9: Jan-14  B     32       NA
# 10: Feb-14  B     14       NA
# 11: Mar-14  B     15       NA
# 12: Apr-14  B     18       NA
# 13: May-14  B     20       NA
# 14: Jun-14  B     12 20.33333
# 15: Jul-14  B     13 15.66667
# 16: Aug-14  B     14 17.66667