Travis Heeter - 1 year ago 81
R Question

# How to send a confusion matrix to caret's confusionMatrix?

I'm looking at this data set: https://archive.ics.uci.edu/ml/datasets/Credit+Approval. I built a ctree:

``````myFormula<-class~.          # class is a factor of "+" or "-"
ct <- ctree(myFormula, data = train)
``````

And now I'd like to put that data into caret's confusionMatrix method to get all the stats associated with the confusion matrix:

``````testPred <- predict(ct, newdata = test)

#### This is where I'm doing something wrong ####
confusionMatrix(table(testPred, test\$class),positive="+")
####  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ####

\$positive
[1] "+"

\$table
td
testPred  -  +
- 99  6
+ 20 88

\$overall
Accuracy          Kappa  AccuracyLower  AccuracyUpper   AccuracyNull AccuracyPValue  McnemarPValue
8.779343e-01   7.562715e-01   8.262795e-01   9.186911e-01   5.586854e-01   6.426168e-24   1.078745e-02

\$byClass
Sensitivity          Specificity       Pos Pred Value       Neg Pred Value            Precision               Recall                   F1
0.9361702            0.8319328            0.8148148            0.9428571            0.8148148            0.9361702            0.8712871
Prevalence       Detection Rate Detection Prevalence    Balanced Accuracy
0.4413146            0.4131455            0.5070423            0.8840515

\$mode
[1] "sens_spec"

\$dots
list()

attr(,"class")
[1] "confusionMatrix"
``````

So Sensetivity is:

(from caret's confusionMatrix doc)

If you take my confusion matrix:

``````\$table
td
testPred  -  +
- 99  6
+ 20 88
``````

You can see this doesn't add up:
`Sensetivity = 99/(99+20) = 99/119 = 0.831928`
. In my confusionMatrix results, that value is for Specificity. However Specificity is
`Specificity = D/(B+D) = 88/(88+6) = 88/94 = 0.9361702`
, the value for Sensitivity.

I've tried this
`confusionMatrix(td,testPred, positive="+")`
but got even weirder results. What am I doing wrong?

UPDATE: I also realized that my confusion matrix is different than what caret thought it was:

``````   Mine:               Caret:

td             testPred
testPred  -  +      td   -  +
- 99  6        - 99 20
+ 20 88        +  6 88
``````

As you can see, it thinks my False Positive and False Negative are backwards.

UPDATE: I found it's a lot better to send the data, rather than a table as a parameter. From the confusionMatrix docs:

reference
a factor of classes to be used as the true results

I took this to mean what symbol constitutes a positive outcome. In my case, this would have been a `+`. However, 'reference' refers to the actual outcomes from the data set, aka the dependent variable.

So I should have used `confusionMatrix(testPred, test\$class)`. If your data is out of order for some reason, it will shift it into the correct order (so the positive and negative outcomes/predictions align correctly in the confusion matrix.

However, if you are worried about the outcome being the correct factor, install the `plyr` library, and use `revalue` to change the factor:

``````install.packages("plyr")
library(plyr)
newDF <- df
newDF\$class <- revalue(newDF\$class,c("+"=1,"-"=0))
# You'd have to rerun your model using newDF
``````

I'm not sure why this worked, but I just removed the positive parameter:

``````confusionMatrix(table(testPred, test\$class))
``````

My Confusion Matrix:

``````        td
testPred  -  +
- 99  6
+ 20 88
``````

Caret's Confusion Matrix:

``````        td
testPred  -  +
- 99  6
+ 20 88
``````

Although now it says `\$positive: "-"` so I'm not sure if that's good or bad.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download