sriya - 1 year ago 64
R Question

# How to get average of lag period in a frequency as a new variable in r

I am trying to arrange my panel data set to create new variable using averages of regular occurring lag periods. My example data set look like below and can be accessed using below codes

``````     Time ID  Value1 Value2
1  Jan-14  A     12     NA
2  Feb-14  A     14     NA
3  Mar-14  A     15     NA
4  Apr-14  A     18     NA
5  May-14  A     10     NA
6  Jun-14  A     12  13.67
7  Jul-14  A     13  15.67
8  Aug-14  A     14  14.33
9  Jan-14  B     32     NA
10 Feb-14  B     14     NA
11 Mar-14  B     15     NA
12 Apr-14  B     18     NA
13 May-14  B     20     NA
14 Jun-14  B     12  20.33
15 Jul-14  B     13  15.67
16 Aug-14  B     14  17.78

df<-structure(list(Time = structure(c(4L, 3L, 7L, 1L, 8L, 6L, 5L,
2L, 4L, 3L, 7L, 1L, 8L, 6L, 5L, 2L), .Label = c("Apr-14", "Aug-14",
"Feb-14", "Jan-14", "Jul-14", "Jun-14", "Mar-14", "May-14"), class = "factor"),
ID = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A", "B"), class = "factor"),
Value1 = c(12L, 14L, 15L, 18L, 10L, 12L, 13L, 14L, 32L, 14L,
15L, 18L, 20L, 12L, 13L, 14L)), .Names = c("Time", "ID",
"Value1"), class = "data.frame", row.names = c(NA, -16L))
``````

I want to create new variable called Value2 considering the average of 3 months corresponding to the previous 4th-6th month for each observation (this is just for the example, my monthly data set goes for 12 years across 40 individuals, and also I will have to consider a higher lag than 3 months, perhaps 14 months). Here I need to consider the ID as well because need to retain the panel structure. For all the observations which do not have complete 3 months previously, should indicate as
`NA`
. For example
`Jun_14_A`
should get the average from
`Mar_14_A`
to
`Jan_14_A`
. (12+14+15)/3 =13.67

I referred some examples here but those do not provide quite what I am looking for specially lag averages and indicating as NA s when there are no previous 3 complete months.

Thank you in advance for any help

Here's another way using just `data.table` and it's `shift` function combined with `Reduce` (This is almost a dupe of this)

``````library(data.table)
setDT(df)[, Value2 := Reduce(`+`, shift(Value1, 3:5))/3, by = ID]
df
#       Time ID Value1   Value2
#  1: Jan-14  A     12       NA
#  2: Feb-14  A     14       NA
#  3: Mar-14  A     15       NA
#  4: Apr-14  A     18       NA
#  5: May-14  A     10       NA
#  6: Jun-14  A     12 13.66667
#  7: Jul-14  A     13 15.66667
#  8: Aug-14  A     14 14.33333
#  9: Jan-14  B     32       NA
# 10: Feb-14  B     14       NA
# 11: Mar-14  B     15       NA
# 12: Apr-14  B     18       NA
# 13: May-14  B     20       NA
# 14: Jun-14  B     12 20.33333
# 15: Jul-14  B     13 15.66667
# 16: Aug-14  B     14 17.66667
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download