Sairam Reddipalli Sairam Reddipalli - 1 month ago 10
R Question

Mutate a lead or lag column for certain rows in r

I have a forecast that doesn't quite align with the moving holidays. I am trying to find a quick fix:

Here is the structure of my data frame:

df1:
Date City Visitors WKN WKN_2015 Holiday
2016-11-06 New York 40000 45 46 No_Holiday
2016-11-13 New York 50000 46 47 No_Holiday
2016-11-20 New York 50000 47 48 Thanksgiving
2016-11-27 New York 100000 48 49 Cyber_Monday
2016-12-04 New York 100000 49 50 No_Holiday
2016-12-11 New York 70000 50 51 No_Holiday
.
.
.
2017-11-23 New York 120000 47 47 Thanksgiving


Generally you would have more visitors to the city on Thanksgiving day and Cyber Monday. But my forecast is not reflecting that. Now I would like a quick fix with some thing like this:

df1:
Date City Visitors WKN WKN_2015 Holiday New_Visitors
2016-11-06 New York 40000 45 46 No_Holiday 40000
2016-11-13 New York 50000 46 47 No_Holiday 50000
2016-11-20 New York 50000 47 48 Thanksgiving 100000
2016-11-27 New York 100000 48 49 Cyber_Monday 100000
2016-12-04 New York 100000 49 50 No_Holiday 70000
2016-12-11 New York 70000 50 51 No_Holiday 70000
.
.
.
2017-11-23 New York 120000 47 47 Thanksgiving 120000


If you see the above data The new volume only changed for Thanksgiving, Cyber Monday and a week after Cyber Monday.
Is there any way to automate this because the data continues for 2017 and so on.

I was thinking of a quick fix until I develop a forecast to suit the moving holidays. Can anyone point me in the right direction?

I have tried something like this but this doesn't work because I need lag/lead for only those 3 Weeks:

df1 <-
df1 %>%
mutate(New_Visitors = ifelse(Holiday == "Thanksgiving", lag(Visitors, (WKN - WKN_2015), Visitors)


Logic: Look up for thanksgiving each year and see if the WKN's match. If don't then adjust Visitors for next 3 Weeks starting from thanksgiving based on difference between WKN's. If WKN-WKN_2015 == -1 then lead Visitors by 1 for next 3 rows and if WKN-WKN_2015 == 1 then lag Visitors by 1 for next 3 rows

data

df1 <- structure(list(Date = c("2016-11-06", "2016-11-13", "2016-11-20",
"2016-11-27", "2016-12-04", "2016-12-11", "2017-11-23"), City = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L), .Label = "New York", class = "factor"),
Visitors = c(40000L, 50000L, 50000L, 100000L, 100000L, 70000L,
120000L), WKN = c(45L, 46L, 47L, 48L, 49L, 50L, 47L), WKN_2015 = c(46L,
47L, 48L, 49L, 50L, 51L, 47L), Holiday = structure(c(2L,
2L, 3L, 1L, 2L, 2L, 3L), .Label = c("Cyber_Monday", "No_Holiday",
"Thanksgiving"), class = "factor")), .Names = c("Date", "City",
"Visitors", "WKN", "WKN_2015", "Holiday"), row.names = c(NA,
7L), class = "data.frame")

Answer

You are interest in only three weeks per year and you can calculate the lag value in the "Thanksgiving" row. I don't think dplyr is needed to do it.

df1$New_Visitors <- df1$Visitors             # copy Visitors
ind <- which(df1$Holiday == "Thanksgiving")  # get number of "Thanksgiving" rows

invisible(sapply(ind, function(x) {
  lag <- df1[x, "WKN_2015"] - df1[x, "WKN"]       # calculate the lag
  df1[x:(x+2), "New_Visitors"] <<- df1[(x+lag):(x+lag+2), "Visitors"]  # rewrite
}))
> df1  # this method treats the three weeks as a unit, so made two NA rows in the example data)
        Date     City Visitors WKN WKN_2015      Holiday New_Visitors
1 2016-11-06 New York    40000  45       46   No_Holiday        40000
2 2016-11-13 New York    50000  46       47   No_Holiday        50000
3 2016-11-20 New York    50000  47       48 Thanksgiving       100000
4 2016-11-27 New York   100000  48       49 Cyber_Monday       100000
5 2016-12-04 New York   100000  49       50   No_Holiday        70000
6 2016-12-11 New York    70000  50       51   No_Holiday        70000
7 2017-11-23 New York   120000  47       47 Thanksgiving       120000
8       <NA>     <NA>       NA  NA       NA         <NA>           NA
9       <NA>     <NA>       NA  NA       NA         <NA>           NA
Comments