user98235 - 1 year ago 73

R Question

Let's say I created the following data frame in R

`c1 <- sample(10)`

c2 <- sample(10)

c3 <- sample(10)

df1 <- data.frame(c1, c2, c3)

I would like to create new data frame that takes the difference between the current row and previous row of the df1.

Of course, I can create it manually as following:

`c4 <- df1$c1[2:nrow(df1)]-df1$c1[1:(nrow(df1)-1)]`

c5 <- df1$c2[2:nrow(df1)]-df1$c2[1:(nrow(df1)-1)]

c6 <- df1$c3[2:nrow(df1)]-df1$c3[1:(nrow(df1)-1)]

df2 <- data.frame(c4, c5, c6)

but instead of having to define them one by one, I was wondering if there are more efficient ways of creating the columns.

Also, if there's a way, if I wanted to "select" certain columns to take difference, is there a fast way of doing so once I have the list of column names?

Answer Source

We loop through the columns, get the `lag`

with `shift`

and subtract it from the original value. We converted the 'data.frame' to 'data.table' (`setDT(df1)`

).

```
library(data.table)
setnames(setDT(df1)[, lapply(.SD, function(x) (x- shift(x))[-1])], paste0("c", 4:6))[]
```

Or using `dplyr`

```
library(dplyr)
df1 %>%
mutate_each(funs(. - lag(.))) %>%
na.omit()
```

Or a `base R`

option is

```
tail(df1,-1) - head(df1,-1)
```

Or another option is

```
sapply(df1, diff)
```

However, `diff`

would be slower compared to subtracting directly or using the `shift`

(as the OP's post concerns performance)