Ben Ben - 2 months ago 12
R Question

How to do non-equi join with variable column name

How can I do a non-equi join in data.table 1.9.7 with a variable column name? For example,

Known column name:

library(data.table)
dt <- data.table(x=round(rnorm(10)), y=rnorm(10))
binDT <- data.table(LB=c(-Inf, -1, 0, .2, .7, 1.5, 3), RB=c(-1, 0, .2, .7, 1.5, 3, Inf))
dt[binDT, on=.(x>=LB, x<RB)]
x y x.1
1: -Inf 2.2669 -1.0
2: -1.0 -0.5453 0.0
3: -1.0 0.5125 0.0
4: 0.0 1.4151 0.2
5: 0.0 -0.1440 0.2
6: 0.0 -1.1802 0.2
7: 0.0 0.3338 0.2
8: 0.0 -1.8220 0.2
9: 0.2 NA 0.7
10: 0.7 0.3155 1.5
11: 0.7 -0.6284 1.5
12: 1.5 NA 3.0
13: 3.0 NA Inf


Variable column name:

colName <- "x"
dt[binDT, on=.(get(colName)>=LB, get(colName)<RB)] # Error
dt[binDT, on=eval(parse(text="list(x>=LB, x<RB)"))] # Error

Answer

@Shape answer is fine, but there is easier way to achieve it. on argument can take character vector, so it can be a matter of pasting expected columns and operators.

colName="x"
on=sprintf(c("%s>=LB","%s<RB"), colName)
print(on)
#[1] "x>=LB" "x<RB"
dt[binDT, on=on]
#       x           y  x.1
# 1: -Inf          NA -1.0
# 2: -1.0  0.48127355  0.0
# 3:  0.0  0.11779604  0.2
# 4:  0.0 -0.97891522  0.2
# 5:  0.0 -0.05969859  0.2
# 6:  0.0 -0.05625401  0.2
# 7:  0.2          NA  0.7
# 8:  0.7 -0.84438216  1.5
# 9:  0.7  0.80151913  1.5
#10:  1.5 -0.11013456  3.0
#11:  1.5  0.82139242  3.0
#12:  3.0 -1.24386831  Inf