I am trying to calculate the absolute difference between lagged values over several columns. The first row of the resulting data set is NA, which is correct because there is no previous value to calculate the lag. What I don't understand is why the lag isn't calculated for the last value. Note that the last value in the example below (temp) is the lag between the 2nd to last and the 3rd to last values, the lag value between the last and 2nd to last value is missing.
library(tidyverse)
library(purrr)
dim(mtcars) # 32 rows
temp <- map_df(mtcars, ~ abs(diff(lag(.x))))
names(temp) <- paste(names(temp), '.abs.diff.lag', sep= '')
dim(temp) # 31 rows
I think the lag
call in your existing code is unnecessary as diff
calculates the lagged difference automatically (although perhaps I don't understand properly what you are trying to do). You can also use rename_all
to add a suffix to all the variable names.
library(purrr)
library(dplyr)
mtcars %>%
map_df(~ abs(diff(.x))) %>%
rename_all(funs(paste0(., ".abs.diff.lag")))
#> # A tibble: 31 x 11
#> mpg.abs.diff.lag cyl.abs.diff.lag disp.abs.diff.lag hp.abs.diff.lag
#> <dbl> <dbl> <dbl> <dbl>
#> 1 0.0 0 0.0 0
#> 2 1.8 2 52.0 17
#> 3 1.4 2 150.0 17
#> 4 2.7 2 102.0 65
#> 5 0.6 2 135.0 70
#> 6 3.8 2 135.0 140
#> 7 10.1 4 213.3 183
#> 8 1.6 0 5.9 33
#> 9 3.6 2 26.8 28
#> 10 1.4 0 0.0 0
#> # ... with 21 more rows, and 7 more variables: drat.abs.diff.lag <dbl>,
#> # wt.abs.diff.lag <dbl>, qsec.abs.diff.lag <dbl>, vs.abs.diff.lag <dbl>,
#> # am.abs.diff.lag <dbl>, gear.abs.diff.lag <dbl>,
#> # carb.abs.diff.lag <dbl>