asuka - 1 year ago 84

R Question

In the past days, I've been analyzing the performance of R's implementation of Random Forest and the different tools available in order to obtain:

- AUC
- Sensitivity
- Specificity

Thus, I've used two different methods:

- mroc and coords from
**pROC**library in order to obtain the performance of the model at different cutoff points. - confusionMatrix from
**caret**library in order to obtain the optimal performance of the model (AUC, Accuracy, Sensitivity, Specificity, ...)

The point is that I've realized that there is some differences between both approaches.

I've developed the following code:

`suppressMessages(library(randomForest))`

suppressMessages(library(pROC))

suppressMessages(library(caret))

set.seed(100)

t_x <- as.data.frame(matrix(runif(100),ncol=10))

t_y <- factor(sample(c("A","B"), 10, replace = T), levels=c("A","B"))

v_x <- as.data.frame(matrix(runif(50),ncol=10))

v_y <- factor(sample(c("A","B"), 5, replace = T), levels=c("A","B"))

model <- randomForest(t_x, t_y, ntree=1000, importance=T);

prob.out <- predict(model, v_x, type="prob")[,1];

prediction.out <- predict(model, v_x, type="response");

mroc <- roc(v_y,prob.out,plot=F)

results <- coords(mroc,seq(0, 1, by = 0.01),input=c("threshold"),ret=c("sensitivity","specificity","ppv","npv"))

accuracyData <- confusionMatrix(prediction.out,v_y)

If you compare the

That is, the confusionMatrix results are:

`Confusion Matrix and Statistics`

Reference

Prediction A B

A 1 1

B 2 1

Accuracy : 0.4

95% CI : (0.0527, 0.8534)

No Information Rate : 0.6

P-Value [Acc > NIR] : 0.913

Kappa : -0.1538

Mcnemar's Test P-Value : 1.000

Sensitivity : 0.3333

Specificity : 0.5000

Pos Pred Value : 0.5000

Neg Pred Value : 0.3333

Prevalence : 0.6000

Detection Rate : 0.2000

Detection Prevalence : 0.4000

Balanced Accuracy : 0.4167

'Positive' Class : A

But if I look for such Sensitivity and Specificity in the coords calculation, I find them swapped:

`sensitivity specificity ppv npv`

0.32 0.5 0.3333333 0.3333333 0.5000000

Apparently, Sensitivity and Specificity is are opposite in coords and confusionMatrix.

Taking into account that confusionMatrix identifies correctly the positive class, I assume that this good interpretation of Sensitivity and Specificity.

My question is:

Answer Source

If you look at the output of `confusionMatrix`

, you can see this:

```
'Positive' Class : A
```

Now looking at `mroc`

, class B is taken as the positive class:

```
Data: prob.out in 3 controls (v_y A) < 2 cases (v_y B).
```

Basically, `pROC`

takes the levels of your factor as Negative, Positive and `caret`

does the exact opposite. You can specify your levels explicitly with `pROC`

to get the same behaviour:

```
mroc <- roc(v_y,prob.out,plot=F, levels = c("B", "A"))
```