user98235 user98235 - 6 months ago 25
R Question

Creating columns of differences faster in R

Let's say I created the following data frame in R

c1 <- sample(10)
c2 <- sample(10)
c3 <- sample(10)
df1 <- data.frame(c1, c2, c3)

I would like to create new data frame that takes the difference between the current row and previous row of the df1.

Of course, I can create it manually as following:

c4 <- df1$c1[2:nrow(df1)]-df1$c1[1:(nrow(df1)-1)]
c5 <- df1$c2[2:nrow(df1)]-df1$c2[1:(nrow(df1)-1)]
c6 <- df1$c3[2:nrow(df1)]-df1$c3[1:(nrow(df1)-1)]
df2 <- data.frame(c4, c5, c6)

but instead of having to define them one by one, I was wondering if there are more efficient ways of creating the columns.

Also, if there's a way, if I wanted to "select" certain columns to take difference, is there a fast way of doing so once I have the list of column names?


We loop through the columns, get the lag with shift and subtract it from the original value. We converted the 'data.frame' to 'data.table' (setDT(df1)).

setnames(setDT(df1)[, lapply(.SD, function(x) (x- shift(x))[-1])], paste0("c", 4:6))[]

Or using dplyr

df1 %>%
    mutate_each(funs(. - lag(.))) %>%

Or a base R option is

tail(df1,-1) - head(df1,-1)

Or another option is

sapply(df1, diff)

However, diff would be slower compared to subtracting directly or using the shift (as the OP's post concerns performance)