N.R N.R - 1 month ago 14
R Question

Predict warning-----new data rows <> variable rows

I'm a beginner in R.
I tried to build a model by using a part of samples and predict response by using the rest samples. But when I use predict(), I got a warning message:

'newdata' had 152 rows but variables found have 354 rows

I have searched some answers, but I still can't understand T.T. Please help

library(MASS)
data(Boston)

n <- nrow(Boston)
n_train <- round(.70*n)
train_set <- sample(n,size=n_train,replace = FALSE)

x <- cbind(Boston$lstat,log(Boston$lstat))
y <- Boston$medv

x_train <- x[train_set,]
y_train <- y[train_set]

x_test <- x[-train_set,]
y_test <- y[-train_set]

lm_temp <- lm(y_train~x_train)
y_test_hat <- predict(lm_temp,newdata=data.frame(x_test))

Answer

It looks like R is getting confused when you pass a matrix as the independent variables, but then the predict function requires a data frame(which is a list).

You can solve the problem by running your lm on a data frame

library(MASS)
data(Boston)

n <- nrow(Boston)
n_train <- round(.70*n)
train_set <- sample(n,size=n_train,replace = FALSE)

data <- Boston[ , c('medv', 'lstat')]
data$loglstat <- log(data$lstat)

train <- data[train_set, ]
test <- data[-train_set,]

lm_temp <- lm(medv ~ ., data = train)
y_test_hat <- predict(lm_temp,newdata=test)