asuka - 1 year ago 110
R Question

# R and Random Forest: How caret and pROC deal with positive and negative class?

In the past days, I've been analyzing the performance of R's implementation of Random Forest and the different tools available in order to obtain:

• AUC

• Sensitivity

• Specificity

Thus, I've used two different methods:

• mroc and coords from pROC library in order to obtain the performance of the model at different cutoff points.

• confusionMatrix from caret library in order to obtain the optimal performance of the model (AUC, Accuracy, Sensitivity, Specificity, ...)

The point is that I've realized that there is some differences between both approaches.

I've developed the following code:

``````suppressMessages(library(randomForest))
suppressMessages(library(pROC))
suppressMessages(library(caret))

set.seed(100)

t_x <- as.data.frame(matrix(runif(100),ncol=10))
t_y <- factor(sample(c("A","B"), 10, replace = T), levels=c("A","B"))

v_x  <- as.data.frame(matrix(runif(50),ncol=10))
v_y <- factor(sample(c("A","B"), 5, replace = T), levels=c("A","B"))

model <- randomForest(t_x, t_y, ntree=1000, importance=T);
prob.out <- predict(model, v_x, type="prob")[,1];
prediction.out <- predict(model, v_x, type="response");

mroc <- roc(v_y,prob.out,plot=F)

results <- coords(mroc,seq(0, 1, by = 0.01),input=c("threshold"),ret=c("sensitivity","specificity","ppv","npv"))

accuracyData <- confusionMatrix(prediction.out,v_y)
``````

If you compare the results and accuracyData variables, you can see that the relationship between sensitivity and specificity is the opposite.

That is, the confusionMatrix results are:

``````Confusion Matrix and Statistics

Reference
Prediction A B
A 1 1
B 2 1

Accuracy : 0.4
95% CI : (0.0527, 0.8534)
No Information Rate : 0.6
P-Value [Acc > NIR] : 0.913

Kappa : -0.1538
Mcnemar's Test P-Value : 1.000

Sensitivity : 0.3333
Specificity : 0.5000
Pos Pred Value : 0.5000
Neg Pred Value : 0.3333
Prevalence : 0.6000
Detection Rate : 0.2000
Detection Prevalence : 0.4000
Balanced Accuracy : 0.4167

'Positive' Class : A
``````

But if I look for such Sensitivity and Specificity in the coords calculation, I find them swapped:

``````     sensitivity specificity       ppv       npv
0.32         0.5   0.3333333 0.3333333 0.5000000
``````

Apparently, Sensitivity and Specificity is are opposite in coords and confusionMatrix.

Taking into account that confusionMatrix identifies correctly the positive class, I assume that this good interpretation of Sensitivity and Specificity.

My question is: Is there any way of forcing coords to interpret the positive and negative classes in the way I want to?

If you look at the output of `confusionMatrix`, you can see this:

``````       'Positive' Class : A
``````

Now looking at `mroc`, class B is taken as the positive class:

``````Data: prob.out in 3 controls (v_y A) < 2 cases (v_y B).
``````

Basically, `pROC` takes the levels of your factor as Negative, Positive and `caret` does the exact opposite. You can specify your levels explicitly with `pROC` to get the same behaviour:

``````mroc <- roc(v_y,prob.out,plot=F, levels = c("B", "A"))
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download