Johan Larsson Johan Larsson - 3 months ago 25
R Question

Error adding interactions to custom VGAM::vglm model in caret

I've built a custom model with

caret
using
vglm()
from
VGAM
. It works fine with simple effects but when I try to add interactions it fails with a
object 'x1:x2' not found
error message, where
x1
and
x2
are predictor variables that I've entered into the model as an interaction. The problem has to do with the predictions, and unless I'm mistaken it seems to occur because either
predict.train
or
predictvglm
tries to use
x1:x2
to predict classes.

I've provided a working example below.

# Set up data
set.seed(123)
n <- 100
x1 <- rnorm(n, 175, 7)
x2 <- rnorm(n, 30, 8)
cont <- 0.5 * x1 - 0.3 * x2 + 10 + rnorm(n, 0, 6)
y <- cut(cont, breaks = quantile(cont), include.lowest = TRUE,
labels = c("A", "B", "C", "D"), ordered = TRUE)
d <- data.frame(x1, x2, y)

# My custom caret function
vglmTrain <- list(
label = "VGAM prop odds",
library = "VGAM",
loop = NULL,
type = "Classification",
parameters = data.frame(parameter = "parameter",
class = "character",
label = "parameter"),
grid = function(x, y,
len = NULL, search = "grid") data.frame(parameter = "none"),
fit = function(x, y, wts, param, lev, last, classProbs, ...) {
dat <- if(is.data.frame(x)) x else as.data.frame(x)
dat$.outcome <- y
if(!is.null(wts))
{
out <- vglm(.outcome ~ ., propodds, data = dat, weights = wts, ...)
} else {
out <- vglm(.outcome ~ ., propodds, data = dat, ...)
}
out
},
predict = function(modelFit, newdata, preProc = NULL, submodels = NULL) {
probs <- predict(modelFit, data.frame(newdata), type = "response")

predClass <- function (x) {
n <- colnames(x)
factor(as.vector(apply(x, 1, which.max)),
levels = 1:length(n),
labels = n)
}
predClass(probs)
},
prob = function(modelFit, newdata, preProc = NULL, submodels = NULL)
predict(modelFit, data.frame(newdata), type = "response"),
predictors = function(x, ...) names(attributes(terms(x))$dataClasses[-1]),
levels = function(x) x@misc$ynames,
sort = function(x) x)


Now if I try to use the function, it exits with an error if I supply the formula with an interaction.

# Load caret
library(caret)

ctrl <- trainControl(method = "cv", number = 2, verboseIter = T)

# A model with no interactions - works
f1 <- train(y ~ x1 + x2, data = d,
method = vglmTrain,
trControl = ctrl)

# A model with interactions - fails
f2 <- train(y ~ x1*x2, data = d,
method = vglmTrain,
trControl = ctrl)

Error in train.default(x, y, weights = w, ...) : Stopping
In addition: Warning messages:
1: In eval(expr, envir, enclos) :
predictions failed for Fold1: parameter=none Error in eval(expr, envir, enclos) : object 'x1:x2' not found

2: In eval(expr, envir, enclos) :
predictions failed for Fold2: parameter=none Error in eval(expr, envir, enclos) : object 'x1:x2' not found

3: In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
There were missing values in resampled performance measures.


Here's my sessionInfo():

> sessionInfo()
R version 3.2.4 (2016-03-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] splines stats4 stats graphics grDevices utils datasets methods base

other attached packages:
[1] VGAM_1.0-0 caret_6.0-64 ggplot2_2.1.0 lattice_0.20-33

loaded via a namespace (and not attached):
[1] Rcpp_0.12.3 magrittr_1.5 MASS_7.3-45 munsell_0.4.3 colorspace_1.2-6 foreach_1.4.3 minqa_1.2.4 stringr_1.0.0 car_2.1-1
[10] plyr_1.8.3 tools_3.2.4 nnet_7.3-12 pbkrtest_0.4-6 parallel_3.2.4 grid_3.2.4 gtable_0.2.0 nlme_3.1-125 mgcv_1.8-12
[19] quantreg_5.21 e1071_1.6-7 class_7.3-14 MatrixModels_0.4-1 iterators_1.0.8 lme4_1.1-11 Matrix_1.2-3 nloptr_1.0.4 reshape2_1.4.1
[28] codetools_0.2-14 stringi_1.0-1 compiler_3.2.4 scales_0.4.0 SparseM_1.7


Does anybody have any idea how to fix this?

Answer

Caret does handle interactions. I found a workaround however. You can first call model.matrix to create a matrix with the interactions. You need to remove the intercept as well.

Using your f2 as an example, we specify the data not as a formula, but as x and y. The x contains the model.matrix specification with the interactions and the -1 removes the intercept. This transformed as a data.frame and your are set to go.

f2 <- train(y = y, x = data.frame(model.matrix(y ~ x1*x2 - 1, data = d)),
            method = vglmTrain,
            trControl = ctrl)

EDIT:

After debugging the train.default and checking your modeltype specification and some others, I found a check that is done in the caret models and not in yours. The check has to do with the predict and probs functions. These both have a check on the Dataframe. If you add this check to both of these functions it works as expected.

if (!is.data.frame(newdata)) 
  newdata <- as.data.frame(newdata)

The whole function will then be:

vglmTrain <- list(
  label = "VGAM prop odds",
  library = "VGAM",
  loop = NULL,
  type = "Classification",
  parameters = data.frame(parameter = "parameter",
                          class = "character",
                          label = "parameter"),
  grid = function(x, y,
                  len = NULL, search = "grid") data.frame(parameter = "none"),
  fit = function(x, y, wts, param, lev, last, classProbs, ...) {
    dat <- if(is.data.frame(x)) x else as.data.frame(x)
    dat$.outcome <- y
    if(!is.null(wts))
    {
      out <- vglm(.outcome ~ ., propodds, data = dat, weights = wts, ...)
    } else {
      out <- vglm(.outcome ~ ., propodds, data = dat, ...)
    }
    out
  },
  predict = function(modelFit, newdata, preProc = NULL, submodels = NULL) {

    if (!is.data.frame(newdata)) 
      newdata <- as.data.frame(newdata)
    probs <- predict(modelFit, newdata, type = "response")

    predClass <- function (x) {
      n <- colnames(x)
      factor(as.vector(apply(x, 1, which.max)),
             levels = 1:length(n),
             labels = n)
    }
    predClass(probs)
  },
  prob = function(modelFit, newdata, preProc = NULL, submodels = NULL) {
    if (!is.data.frame(newdata)) 
      newdata <- as.data.frame(newdata)

    predict(modelFit, newdata, type = "response")
  },

  levels = function(x) x@misc$ynames,

  tags = c("Cumulative Link", "Logistic Regression", "Accepts Case Weights",
           "Probit", "Logit"),

  sort = function(x) x)