Andrew - 3 months ago 33

R Question

Given a vector of scores and a vector of actual class labels, how do you calculate a single-number AUC metric for a binary classifier in the R language or in simple English?

Page 9 of "AUC: a Better Measure..." seems to require knowing the class labels, and here is an example in MATLAB where I don't understand

`R(Actual == 1))`

Because R (not to be confused with the R language) is defined a vector but used as a function?

Answer

As mentioned by others, you can compute the AUC using the ROCR package. With the ROCR package you can also plot the ROC curve, lift curve and other model selection measures.

You can compute the AUC directly without using any package by using the fact that the AUC is equal to the probability that a true positive is scored greater than a true negative.

For example, if `pos.scores`

is a vector containing a score of the positive examples, and `neg.scores`

is a vector containing the negative examples then the AUC is approximated by:

```
> mean(sample(pos.scores,1000,replace=T) > sample(neg.scores,1000,replace=T))
[1] 0.7261
```

will give an approximation of the AUC. You can also estimate the variance of the AUC by bootstrapping:

```
> aucs = replicate(1000,mean(sample(pos.scores,1000,replace=T) > sample(neg.scores,1000,replace=T)))
```