Anirban Mukherjee - 1 month ago 4x
R Question

# Non-equi join, then summarize by group

Here is a MWE.

``````dta <- data.table(id=rep(1:2, each=5), seq=rep(1:5, 2), val=1:10)
dtb <- data.table(id=c(1, 1, 2, 2), fil=c(2, 3, 3, 4))
dtc <- data.table(id=c(1, 1, 2, 2), mval=rep(0, 4))
for (ind in 1:4) dtc\$mval[ind] <- mean( dta\$val [dta\$id == dtb\$id[ind] & dta\$seq < dtb\$fil[ind]] )

dtc
#    id      mval
# 1:  1       1.0
# 2:  1       1.5
# 3:  2       6.5
# 4:  2       7.0
``````

dtc should have the same number of rows as dtb. For every (row)
`ind`
in dtc,

1. `dtc\$id[ind]`
=
`dtb\$id[ind]`
.

2. `dtc\$mval[ind]`
=
`mean(dta\$val[x])`
, where x is
`dta\$id == dtb\$id[ind] & dta\$seq < dtb\$fil[ind]`
.

My data.tables are extremely large. Hence, I am looking for a way to achieve the above with minimal memory footprint. I was thinking a non-equi join and then a summarize, but I can't seem to get that to work. Hence, the title of the question.

Would greatly appreciate any help, thanks!

``````dtc[, mval := dta[dtb, mean(val) ,on =.(id, seq < fil), by = .EACHI]\$V1]