jf328 - 4 months ago 23

R Question

I have a datatable of data and a datatable of fitted coefficients. I want to calculate the fitted value for each row.

`dt = data.table(a = rep(c("x","y"), each = 5), b = rnorm(10), c = rnorm(10), d = rnorm(10))`

coefs = data.table(a = c("x","y"), b = c(0, 1), d = c(2,3))

dt

# a b c d

# 1: x -0.25174915 -0.2130797 -0.67909764

# 2: x -0.35569766 0.6014930 0.35201386

# 3: x -0.31600957 0.4398968 -1.15475814

# 4: x -0.54113762 -2.3497952 0.64503654

# 5: x 0.11227873 0.0233775 -0.96891456

# 6: y 1.24077566 -1.2843439 1.98883516

# 7: y -0.23819626 0.9950835 -0.17279980

# 8: y 1.49353589 0.3067897 -0.02592004

# 9: y 0.01033722 -0.5967766 -0.28536224

#10: y 0.69882444 0.8702424 1.24131062

coefs # NB no "c" column

# a b d

#1: x 0 2

#2: y 1 3

For each

`a=="x"`

`0*b+2*d`

`a=="y"`

`1*b+3*d`

Is there a datatable way to do this without hardcode the column name? I'm happy to put the column names in a variable

`cols = colnames(coefs)[-1]`

It's easy to loop over groups and

`rbind`

Answer

Join the data.tables:

```
dt[coefs, res := b * i.b + d * i.d, on = "a"]
# a b c d res
#1: x 0.09901786 -0.362080111 -0.5108862 -1.0217723
#2: x -0.16128422 0.169655945 0.3199648 0.6399295
#3: x -0.79648896 -0.502279345 1.3828633 2.7657266
#4: x -0.26121421 0.480548972 -1.1559392 -2.3118783
#5: x 0.54085591 -0.601323442 1.3833795 2.7667590
#6: y 0.83662761 0.607666970 0.6320762 2.7328562
#7: y -1.92510391 -0.050515610 -0.3176544 -2.8780671
#8: y 1.65639926 -0.167090105 0.6830158 3.7054466
#9: y 1.48772354 -0.349713539 -1.2736467 -2.3332166
#10: y 1.49065993 0.008198885 -0.1923361 0.9136516
```

Usually you would use the matrix product here, but that would mean that you had to coerce the respective subset to a matrix. That would result in a copy being made and since data.tables are mainly used for larger data, you want to avoid copies.

If you need dynamic column names, the most simple solution that comes to mind is actually an `eval`

/`parse`

construct:

```
cols = colnames(coefs)[-1]
expr <- parse(text = paste(paste(cols, paste0("i.", cols), sep = "*"), collapse = "+"))
#expression(b*i.b+d*i.d)
dt[coefs, res := eval(expr), on = "a"]
```

Maybe someone else can suggest a better solution.