BDillan BDillan - 1 year ago 152
R Question

R - Calculate Test MSE given a trained model from a training set and a test set

Given two simple sets of data:

x y
1 1 2.167512
2 2 4.684017
3 3 3.702477
4 4 9.417312
5 5 9.424831
6 6 13.090983

x y
1 1 2.068663
2 2 4.162103
3 3 5.080583
4 4 8.366680
5 5 8.344651

I want to fit a linear regression line on the training data, and use that line (or the coefficients) to calculate the "test MSE" or Mean Squared Error of the Residuals on the test data once that line is fit there.

model = lm(y~x,data=training_set)
train_MSE = mean(model$residuals^2)
test_MSE = ?

Answer Source

In this case, it is more precise to call it MSPE (mean squared prediction error):

mean((test_set$y - predict.lm(model, test_set)) ^ 2)

This is a more useful measure as all models aim at prediction. We want a model with minimal MSPE.

In practice, if we do have a spare test data set, we can directly compute MSPE as above. However, very often we don't have spare data. In statistics, the leave-one-out cross-validation is an estimate of MSPE from the training dataset.

There are also several other statistics for assessing prediction error, like Mallows's statistic and AIC.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download