user98235 - 4 months ago 9
R Question

# Creating columns of differences faster in R

Let's say I created the following data frame in R

``````c1 <- sample(10)
c2 <- sample(10)
c3 <- sample(10)
df1 <- data.frame(c1, c2, c3)
``````

I would like to create new data frame that takes the difference between the current row and previous row of the df1.

Of course, I can create it manually as following:

``````c4 <- df1\$c1[2:nrow(df1)]-df1\$c1[1:(nrow(df1)-1)]
c5 <- df1\$c2[2:nrow(df1)]-df1\$c2[1:(nrow(df1)-1)]
c6 <- df1\$c3[2:nrow(df1)]-df1\$c3[1:(nrow(df1)-1)]
df2 <- data.frame(c4, c5, c6)
``````

but instead of having to define them one by one, I was wondering if there are more efficient ways of creating the columns.

Also, if there's a way, if I wanted to "select" certain columns to take difference, is there a fast way of doing so once I have the list of column names?

We loop through the columns, get the `lag` with `shift` and subtract it from the original value. We converted the 'data.frame' to 'data.table' (`setDT(df1)`).

``````library(data.table)
setnames(setDT(df1)[, lapply(.SD, function(x) (x- shift(x))[-1])], paste0("c", 4:6))[]
``````

Or using `dplyr`

``````library(dplyr)
df1 %>%
mutate_each(funs(. - lag(.))) %>%
na.omit()
``````

Or a `base R` option is

``````tail(df1,-1) - head(df1,-1)
``````

Or another option is

``````sapply(df1, diff)
``````

However, `diff` would be slower compared to subtracting directly or using the `shift` (as the OP's post concerns performance)

Source (Stackoverflow)