tacqy2 - 5 months ago 24

R Question

I am trying to use rfeControl and rfe for a simple feature selection task using svm. The input file is small and has 20 features with 414 samples. The input can be found here [https://www.dropbox.com/sh/hj91gd06dbbyi1o/AABTHPuP4kI85onSqBiGH_ISa?dl=0].

Ignoring warning, what I do not understand about the error below is that as I understand maximize takes value when metric==RMSE and I, however, have metric==Accuracy as I am performing classification (reference: https://github.com/topepo/caret/blob/master/pkg/caret/R/rfe.R):

`Error in if (maximize) which.max(x[, metric]) else which.min(x[, metric]) :`

argument is not interpretable as logical

In addition: Warning message:

In if (maximize) which.max(x[, metric]) else which.min(x[, metric]) :

the condition has length > 1 and only the first element will be used

The code is as follows:

`library("caret")`

library("mlbench")

sensor6data_2class <- read.csv("/home/sensei/clustering/svm_2labels.csv")

sensor6data_2class <- within(sensor6data_2class, Class <- as.factor(Class))

sensor6data_2class$Class2 <- relevel(sensor6data_2class$Class,ref="1")

set.seed("1298356")

inTrain <- createDataPartition(y = sensor6data_2class$Class, p = .75, list = FALSE)

training <- sensor6data_2class[inTrain,]

testing <- sensor6data_2class[-inTrain,]

trainX <- training[,1:20]

y <- training[,21]

ctrl <- rfeControl(functions = rfFuncs , method = "repeatedcv", number = 5, repeats = 2, allowParallel = TRUE)

model_train <- rfe(x = trainX, y = y, sizes = c(10,11), metric = "Accuracy" , Class2 ~ ZCR + Energy + SpectralC + SpectralS + SpectralE + SpectralF + SpectralR + MFCC1 + MFCC2 + MFCC3 + MFCC4 + MFCC5 + MFCC6 + MFCC7 + MFCC8 + MFCC9 + MFCC10 + MFCC11 + MFCC12 + MFCC13, rfeControl = ctrl, method="svmRadial")

Thanks in advance.

Answer

There are multiple errors in your code.

- You are creating a new class2, but are not selecting it as Y, you are selecting Class
- You are using a formula notation in rfe and the x and y notation. This leads to the error you get. Either use x and y or use the formula notation. Check the example code below.

The code below works:

```
library("caret")
sensor6data_2class <- read.csv("svm_2labels.csv")
sensor6data_2class$Class <- as.factor(sensor6data_2class$Class)
# sensor6data_2class$Class2 <- relevel(sensor6data_2class$Class,ref="1")
set.seed("1298356")
inTrain <- createDataPartition(y = sensor6data_2class$Class, p = .75, list = FALSE)
training <- sensor6data_2class[inTrain,]
testing <- sensor6data_2class[-inTrain,]
trainX <- training[,1:20]
y <- training[,21]
ctrl <- rfeControl(functions = rfFuncs ,
method = "repeatedcv",
number = 5,
repeats = 2,
allowParallel = TRUE)
set.seed("1298356")
model_train <- rfe(x = trainX,
y = y,
sizes = c(10,11),
metric = "Accuracy" ,
rfeControl = ctrl)
set.seed("1298356")
model_train_form <- rfe(Class ~ ZCR + Energy + SpectralC + SpectralS + SpectralE + SpectralF + SpectralR + MFCC1 + MFCC2 + MFCC3 + MFCC4 + MFCC5 + MFCC6 + MFCC7 + MFCC8 + MFCC9 + MFCC10 + MFCC11 + MFCC12 + MFCC13,
data = training,
sizes = c(10,11),
metric = "Accuracy",
rfeControl = ctrl)
```