rlh2 rlh2 - 1 year ago 38
R Question

R data.table Handling of Closures

I am using the data.table package to return a list of function closures in a J expression as output by the approxfun function from the stats package. Basically, on each Date, I would like a closure that allows me to calculate an arbitrary yval based on an arbitrary xval as determined by approxfun. However, approxfun is only valid when there are at least two unique values of x passed to the function. In the case where there is only one unique value of x, I would like to return a function that returns the one unique value of y. In the code below, I perform this step by check the .N value and returning a different function depending on whether or not .N is > 1.

library(data.table)
set.seed(10)
N <- 3
x <- data.table(Date = Sys.Date() + rep(1:N, each = 3), xval = c(0, 30, 90), yval = rnorm(N * 3))
x <- x[-c(2:3), ]

##interpolation happens correctly
x2 <- x[order(Date, xval), {
if(.N > 1){
afun <- approxfun(xval, yval, rule = 1)
}else{
afun <- function(v) yval
}
print(afun(30))
list(Date, afun = list(afun))
}, by = Date]

##evaluation does NOT happen correctly, the val used is the last...
sapply(x2[, afun], do.call, args = list(v = 30))


When evaluating the function 'afun' in the context of the J expression, the correct value of 'yval' is printed. However, when I go back after the fact to evaluate the first function, the yval returned is the last yval in the group created by the 'by' grouping for the function that is not created by approxfun (all the closures created by approxfun work as expected). My suspicion is that this has to do with something I am missing with lazy evaluation. I tried the additional code below using the 'force' function but was unsuccessful.

x3 <- x[order(Date, xval), {
if(.N > 1){
afun <- approxfun(xval, yval, rule = 1)
}else{
fn <- function(x){
force(x)
function(v) x
}
afun <- fn(yval)
}
print(afun(30))
list(Date, afun = list(afun))
}, by = Date]

sapply(x3[, afun], do.call, args = list(v = 30))


Has anyone else encountered this issue? Is it something I am missing with base R or something I am missing with data.table?

Thanks in advance for the help

Answer Source

Yes, typical data.table reference vs copy FAQ. This works as expected:

x2 <- x[order(Date, xval), {
  if(.N > 1){
    afun <- approxfun(xval, yval, rule = 1)
  }else{
    fn <- function(){
      #ensure the value is copied
      x <- copy(yval)
      function(v) x
    }
    afun <- fn()
  }
  print(afun(30))
  list(Date, afun = list(afun))
}, by = Date]
#[1] 0.01874617
#[1] 0.2945451
#[1] -0.363676

sapply(x2[, afun], do.call, args = list(v = 30))
#[1]  0.01874617  0.29454513 -0.36367602
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download