user92519 user92519 - 2 months ago 18
R Question

Bagging with random forest, object not found even with MASS:Boston data set

I am trying to follow along in a textbook example from James et al.'s "An introduction to statistical Learning with Applications in R" and I am running into an error message I don't understand.

library(MASS)
library(randomForest)
set.seed(1)
bag.boston=randomForest(medv~.,data=Boston, subset=train,mtry=13, importance=TRUE)
yhat.bag = predict(bag.boston,newdata=Boston[-train,])


With this last line I get the error message


Error in eval(expr, envir, enclos) : object 'age' not found


Why am I getting this error message and how do I prevent it? I see that a similar question was asked here:
Error in running randomForest : object not found
. but in that case the OP was trying to input a matrix rather than a data frame as their original data set, and anyhow that is at the randomForest call, rather than the predict call.

This person randomForest in R object not found error also had a similar problem, but traced it to non ascii characters in their text file, which I am pretty sure is not characteristic of this data set.

Maybe I am supposed to subsstitute the word "data" for "newdata" in the predict function, but that seems to yield really different answers than I see in the text examples.

Any other thoughts?

Answer

I found a copy of that book you're referring to, which has been published online by the author and USC.

You left out necessary code to run this code block. In the book that code snippet was from a file / R session that had been broken up between many pages and code blocks and depends on the earlier code being run, such as creating train. When the necessary code is included from that book it runs fine. The error you got cannot be reproduced.

library(MASS)
library(randomForest)
library(tree)
set.seed(1)
train = sample(1:nrow(Boston), nrow(Boston)/2)
tree.boston=tree(medv ∼ .,Boston , subset=train)

bag.boston=randomForest(medv~.,data=Boston, subset=train,mtry=13, importance=TRUE)
yhat.bag = predict(bag.boston,newdata=Boston[-train,])

summary(yhat.bag)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  7.965  17.050  21.330  22.700  25.530  48.690