C8H10N4O2 - 2 months ago 9x
R Question

# Take column-wise differences across a data.table

How can I use data.table syntax to produce a data.table where each column contains the differences between the column of the original data.table and the next column?

Example: I have a data.table where each row is a group, and each column is surviving population after year 0, after year 1, 2, etc. Such as:

``````pop <- data.table(group_id = c(1, 2, 3),
N = c(4588L, 4589L, 4589L),
N_surv_1 = c(4213, 4243, 4264),
N_surv_2 = c(3703, 3766, 3820),
N_surv_3 = c(2953, 3054, 3159) )
# group_id    N N_surv_1 N_surv_2 N_surv_3
#        1 4588     4213     3703     2953
#        2 4589     4243     3766     3054
#        3 4589     4264     3820     3159
``````

(Data types differ because N is a true integer count and N_surv_1, etc. are projections that could be fractional.)

What I have done: using the base
`diff`
and matrix transposition, we can:

``````diff <- data.table(t(diff(t(as.matrix(pop[,-1,with=FALSE])))))
setnames(diff, paste0("deaths_",1:ncol(diff)))
cbind(group_id = pop[,group_id],diff)
# produces desired output:
#    group_id deaths_1 deaths_2 deaths_3
#           1     -375     -510     -750
#           2     -346     -477     -712
#           3     -325     -444     -661
``````

I know that I can use base
`diff`
by group on a single column produced by
`melt.data.table`
, so this works but ain't pretty:

``````melt(pop,
id.vars = "group_id"
)[order(group_id)][, setNames(as.list(diff(value)),
paste0("deaths_",1:(ncol(pop)-2)) ),
keyby = group_id]
``````

Is that the most data.table-riffic way to do this, or is there a way to do it as a multi-column operation in data.table?

Well, you could subtract the subsets:

``````ncols = grep("^N(_surv_[0-9]+)?", names(pop), value=TRUE)
pop[, Map(
`-`,
utils:::tail.default(.SD, -1),
), .SDcols=ncols]

#    N_surv_1 N_surv_2 N_surv_3
# 1:     -375     -510     -750
# 2:     -346     -477     -712
# 3:     -325     -444     -661
``````

You could assign these values to new columns with `:=`. I have no idea why `tail` and `head` are not made more easily available... As pointed out by @akrun, you could use `with=FALSE` instead, like `pop[, .SD[, -1, with=FALSE] - .SD[, -ncol(.SD), with=FALSE], .SDcols=ncols]`.

Anyway, this is pretty convoluted compared to simply reshaping:

``````melt(pop, id="group_id")[, tail(value, -1) - head(value, -1), by=group_id]
#    group_id   V1
# 1:        1 -375
# 2:        1 -510
# 3:        1 -750
# 4:        2 -346
# 5:        2 -477
# 6:        2 -712
# 7:        3 -325
# 8:        3 -444
# 9:        3 -661
``````