dcl - 1 year ago 48

R Question

I have a weird problem with R that I can't seem to work out.

I've tried to write a function that performs K-fold cross validation for a model chosen by the stepwise procedure in R. (I'm aware of the issues with stepwise procedures, it's purely for comparison purposes) :)

Now the issue is, that if I define the function parameters (linmod,k,direction) and run the contents of the function, it works flawlessly. BUT, if I run it as a function, I get an error saying the datas.train object can't be found.

I've tried stepping through the function with debug() and the object clearly exists, but R says it doesn't when I actually run the function. If I just fit a model using lm() it works fine, so I believe it's a problem with the step function in the loop, while inside a function. (try commenting out the step command, and set the predictions to those from the ordinary linear model.)

`#CREATE A LINEAR MODEL TO TEST FUNCTION`

lm.cars <- lm(mpg~.,data=mtcars,x=TRUE,y=TRUE)

#THE FUNCTION

cv.step <- function(linmod,k=10,direction="both"){

response <- linmod$y

dmatrix <- linmod$x

n <- length(response)

datas <- linmod$model

form <- formula(linmod$call)

# generate indices for cross validation

rar <- n/k

xval.idx <- list()

s <- sample(1:n, n) # permutation of 1:n

for (i in 1:k) {

xval.idx[[i]] <- s[(ceiling(rar*(i-1))+1):(ceiling(rar*i))]

}

#error calculation

errors <- R2 <- 0

for (j in 1:k){

datas.test <- datas[xval.idx[[j]],]

datas.train <- datas[-xval.idx[[j]],]

test.idx <- xval.idx[[j]]

#THE MODELS+

lm.1 <- lm(form,data= datas.train)

lm.step <- step(lm.1,direction=direction,trace=0)

step.pred <- predict(lm.step,newdata= datas.test)

step.error <- sum((step.pred-response[test.idx])^2)

errors[j] <- step.error/length(response[test.idx])

SS.tot <- sum((response[test.idx] - mean(response[test.idx]))^2)

R2[j] <- 1 - step.error/SS.tot

}

CVerror <- sum(errors)/k

CV.R2 <- sum(R2)/k

res <- list()

res$CV.error <- CVerror

res$CV.R2 <- CV.R2

return(res)

}

#TESTING OUT THE FUNCTION

cv.step(lm.cars)

Any thoughts?

Answer Source

When you created your formula, `lm.cars`

, in was assigned its own environment. This environment stays with the formula unless you explicitly change it. So when you extract the formula with the `formula`

function, the original environment of the model is included.

I don't know if I'm using the correct terminology here, but I think you need to explicitly change the environment for the formula inside your function:

```
cv.step <- function(linmod,k=10,direction="both"){
response <- linmod$y
dmatrix <- linmod$x
n <- length(response)
datas <- linmod$model
.env <- environment() ## identify the environment of cv.step
## extract the formula in the environment of cv.step
form <- as.formula(linmod$call, env = .env)
## The rest of your function follows
```