gjabel gjabel - 2 months ago 16
R Question

Grouped times series lag on selected variables using dplyr

I am trying to use dplyr to lag some variables (all of which have a common naming convention) for each group in my data set.

I thought

mutate_if
would work, but I get an error (below).
mutate_each
works, but for all columns rather than the select few.

For example, I were looking to lag only the Sepal measurements:

iris %>%
tbl_df() %>%
group_by(Species) %>%
slice(1:3) %>%
# mutate_each(funs(lag(.)))
mutate_if(contains("Sepal"), funs(lag(.)))
#> Error in get(as.character(FUN), mode = "function", envir = envir) : object 'p' of mode 'function' was not found


to get a final data set like:

# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# <dbl> <dbl> <dbl> <dbl> <fctr>
# 1 NA NA 1.4 0.2 setosa
# 2 5.1 3.5 1.4 0.2 setosa
# 3 4.9 3.0 1.3 0.2 setosa
# 4 NA NA 4.7 1.4 versicolor
# 5 7.0 3.2 4.5 1.5 versicolor
# 6 6.4 3.2 4.9 1.5 versicolor
# 7 NA NA 6.0 2.5 virginica
# 8 6.3 3.3 5.1 1.9 virginica
# 9 5.8 2.7 5.9 2.1 virginica

Answer

This seems to work,

library(dplyr)
iris %>% 
     tbl_df() %>%
     group_by(Species) %>%
     slice(1:3) %>%
     mutate_if(grepl('Sepal', names(.)), funs(lag(.)))

As @aosmith explains, contains returns an index of the columns that match the string, whereas mutate_if relies on a using predicate functions that return logical vectors, which is why the grepl option works.

In addition, as @StevenBeaupre mentions,

iris %>% 
     tbl_df() %>%
     group_by(Species) %>%
     slice(1:3) %>% 
     mutate_at(vars(contains('Sepal')), lag)