My question is pretty simple but I can't find a clear cut answer using caret package doc.
If I use the preprocessing options center and scale in my train function, it is stated that the same preprocesing will be applied to new data set while doing predictions.
So when I use the predict function:
Does it mean that mean and scale of the training set is applied to the new data? Or a new centering and scaling is applied to the new data set, thus potentially using points in the future if the data are timeseries (which is problematic)?
caret::predict.train uses parameters from the model you built to predict on the test set.
Here is a snippet from the source code that shows the preProc data comes from the object's preProcess parameters:
out <- predictionFunction(method = object$modelInfo, modelFit = object$finalModel, newdata = newdata, preProc = object$preProcess)
You can see these parameters for yourself after creating your model by accessing
Here is a complete example:
rm(list=ls()) library(caret) set.seed(4444) data(mtcars) inTrain <- createDataPartition(y=mtcars$mpg,p=0.75,list=FALSE) training <- mtcars[inTrain,] testing <- mtcars[-inTrain,] lmFit <- train(mpg~.,data=training,method="lm",preProc=c("center","scale")) lmFit$preProcess