dcl dcl - 10 months ago 34
R Question

Object not found error when passing model formula to another function

I have a weird problem with R that I can't seem to work out.

I've tried to write a function that performs K-fold cross validation for a model chosen by the stepwise procedure in R. (I'm aware of the issues with stepwise procedures, it's purely for comparison purposes) :)

Now the issue is, that if I define the function parameters (linmod,k,direction) and run the contents of the function, it works flawlessly. BUT, if I run it as a function, I get an error saying the datas.train object can't be found.

I've tried stepping through the function with debug() and the object clearly exists, but R says it doesn't when I actually run the function. If I just fit a model using lm() it works fine, so I believe it's a problem with the step function in the loop, while inside a function. (try commenting out the step command, and set the predictions to those from the ordinary linear model.)

lm.cars <- lm(mpg~.,data=mtcars,x=TRUE,y=TRUE)

cv.step <- function(linmod,k=10,direction="both"){
response <- linmod$y
dmatrix <- linmod$x
n <- length(response)
datas <- linmod$model
form <- formula(linmod$call)

# generate indices for cross validation
rar <- n/k
xval.idx <- list()
s <- sample(1:n, n) # permutation of 1:n
for (i in 1:k) {
xval.idx[[i]] <- s[(ceiling(rar*(i-1))+1):(ceiling(rar*i))]

#error calculation
errors <- R2 <- 0

for (j in 1:k){
datas.test <- datas[xval.idx[[j]],]
datas.train <- datas[-xval.idx[[j]],]
test.idx <- xval.idx[[j]]

lm.1 <- lm(form,data= datas.train)
lm.step <- step(lm.1,direction=direction,trace=0)

step.pred <- predict(lm.step,newdata= datas.test)
step.error <- sum((step.pred-response[test.idx])^2)
errors[j] <- step.error/length(response[test.idx])

SS.tot <- sum((response[test.idx] - mean(response[test.idx]))^2)
R2[j] <- 1 - step.error/SS.tot

CVerror <- sum(errors)/k
CV.R2 <- sum(R2)/k

res <- list()
res$CV.error <- CVerror
res$CV.R2 <- CV.R2



Any thoughts?

Answer Source

When you created your formula, lm.cars, in was assigned its own environment. This environment stays with the formula unless you explicitly change it. So when you extract the formula with the formula function, the original environment of the model is included.

I don't know if I'm using the correct terminology here, but I think you need to explicitly change the environment for the formula inside your function:

cv.step <- function(linmod,k=10,direction="both"){
  response <- linmod$y
  dmatrix <- linmod$x
  n <- length(response)
  datas <- linmod$model
  .env <- environment() ## identify the environment of cv.step

  ## extract the formula in the environment of cv.step
  form <- as.formula(linmod$call, env = .env) 

  ## The rest of your function follows