tacqy2 tacqy2 - 28 days ago 8
R Question

R caret package rfe error - "argument is not interpretable as logical"

I am trying to use rfeControl and rfe for a simple feature selection task using svm. The input file is small and has 20 features with 414 samples. The input can be found here [https://www.dropbox.com/sh/hj91gd06dbbyi1o/AABTHPuP4kI85onSqBiGH_ISa?dl=0].

Ignoring warning, what I do not understand about the error below is that as I understand maximize takes value when metric==RMSE and I, however, have metric==Accuracy as I am performing classification (reference: https://github.com/topepo/caret/blob/master/pkg/caret/R/rfe.R):

Error in if (maximize) which.max(x[, metric]) else which.min(x[, metric]) :
argument is not interpretable as logical
In addition: Warning message:
In if (maximize) which.max(x[, metric]) else which.min(x[, metric]) :
the condition has length > 1 and only the first element will be used


The code is as follows:

library("caret")
library("mlbench")
sensor6data_2class <- read.csv("/home/sensei/clustering/svm_2labels.csv")
sensor6data_2class <- within(sensor6data_2class, Class <- as.factor(Class))
sensor6data_2class$Class2 <- relevel(sensor6data_2class$Class,ref="1")

set.seed("1298356")
inTrain <- createDataPartition(y = sensor6data_2class$Class, p = .75, list = FALSE)
training <- sensor6data_2class[inTrain,]
testing <- sensor6data_2class[-inTrain,]
trainX <- training[,1:20]
y <- training[,21]

ctrl <- rfeControl(functions = rfFuncs , method = "repeatedcv", number = 5, repeats = 2, allowParallel = TRUE)
model_train <- rfe(x = trainX, y = y, sizes = c(10,11), metric = "Accuracy" , Class2 ~ ZCR + Energy + SpectralC + SpectralS + SpectralE + SpectralF + SpectralR + MFCC1 + MFCC2 + MFCC3 + MFCC4 + MFCC5 + MFCC6 + MFCC7 + MFCC8 + MFCC9 + MFCC10 + MFCC11 + MFCC12 + MFCC13, rfeControl = ctrl, method="svmRadial")


Thanks in advance.

Answer

There are multiple errors in your code.

  1. You are creating a new class2, but are not selecting it as Y, you are selecting Class
  2. You are using a formula notation in rfe and the x and y notation. This leads to the error you get. Either use x and y or use the formula notation. Check the example code below.

The code below works:

library("caret")
sensor6data_2class <- read.csv("svm_2labels.csv")
sensor6data_2class$Class <- as.factor(sensor6data_2class$Class)
# sensor6data_2class$Class2 <- relevel(sensor6data_2class$Class,ref="1")

set.seed("1298356")
inTrain <- createDataPartition(y = sensor6data_2class$Class, p = .75, list = FALSE)
training <- sensor6data_2class[inTrain,]
testing <- sensor6data_2class[-inTrain,]
trainX <- training[,1:20]
y <- training[,21]

ctrl <- rfeControl(functions = rfFuncs , 
                   method = "repeatedcv",
                   number = 5, 
                   repeats = 2, 
                   allowParallel = TRUE)
set.seed("1298356")
model_train <- rfe(x = trainX, 
                   y = y, 
                   sizes = c(10,11), 
                   metric = "Accuracy" , 
                   rfeControl = ctrl)
set.seed("1298356")
model_train_form <- rfe(Class ~ ZCR + Energy + SpectralC + SpectralS + SpectralE + SpectralF + SpectralR + MFCC1 + MFCC2 + MFCC3 + MFCC4 + MFCC5 + MFCC6 + MFCC7 + MFCC8 + MFCC9 + MFCC10 + MFCC11 + MFCC12 + MFCC13, 
                        data = training,
                        sizes = c(10,11), 
                        metric = "Accuracy",
                        rfeControl = ctrl)