C8H10N4O2 C8H10N4O2 - 3 months ago 12
R Question

Take column-wise differences across a data.table

How can I use data.table syntax to produce a data.table where each column contains the differences between the column of the original data.table and the next column?

Example: I have a data.table where each row is a group, and each column is surviving population after year 0, after year 1, 2, etc. Such as:

pop <- data.table(group_id = c(1, 2, 3),
N = c(4588L, 4589L, 4589L),
N_surv_1 = c(4213, 4243, 4264),
N_surv_2 = c(3703, 3766, 3820),
N_surv_3 = c(2953, 3054, 3159) )
# group_id N N_surv_1 N_surv_2 N_surv_3
# 1 4588 4213 3703 2953
# 2 4589 4243 3766 3054
# 3 4589 4264 3820 3159


(Data types differ because N is a true integer count and N_surv_1, etc. are projections that could be fractional.)

What I have done: using the base
diff
and matrix transposition, we can:

diff <- data.table(t(diff(t(as.matrix(pop[,-1,with=FALSE])))))
setnames(diff, paste0("deaths_",1:ncol(diff)))
cbind(group_id = pop[,group_id],diff)
# produces desired output:
# group_id deaths_1 deaths_2 deaths_3
# 1 -375 -510 -750
# 2 -346 -477 -712
# 3 -325 -444 -661


I know that I can use base
diff
by group on a single column produced by
melt.data.table
, so this works but ain't pretty:

melt(pop,
id.vars = "group_id"
)[order(group_id)][, setNames(as.list(diff(value)),
paste0("deaths_",1:(ncol(pop)-2)) ),
keyby = group_id]


Is that the most data.table-riffic way to do this, or is there a way to do it as a multi-column operation in data.table?

Answer

Well, you could subtract the subsets:

ncols = grep("^N(_surv_[0-9]+)?", names(pop), value=TRUE)
pop[, Map(
  `-`, 
   utils:::tail.default(.SD, -1), 
   utils:::head.default(.SD, -1)
), .SDcols=ncols]

#    N_surv_1 N_surv_2 N_surv_3
# 1:     -375     -510     -750
# 2:     -346     -477     -712
# 3:     -325     -444     -661

You could assign these values to new columns with :=. I have no idea why tail and head are not made more easily available... As pointed out by @akrun, you could use with=FALSE instead, like pop[, .SD[, -1, with=FALSE] - .SD[, -ncol(.SD), with=FALSE], .SDcols=ncols].

Anyway, this is pretty convoluted compared to simply reshaping:

melt(pop, id="group_id")[, tail(value, -1) - head(value, -1), by=group_id]
#    group_id   V1
# 1:        1 -375
# 2:        1 -510
# 3:        1 -750
# 4:        2 -346
# 5:        2 -477
# 6:        2 -712
# 7:        3 -325
# 8:        3 -444
# 9:        3 -661