Faydey - 1 year ago 102
R Question

# Obtaining threshold values from a ROC curve

I have some models, using

`ROCR`
package on a vector of the predicted class percentages, I have a performance object. Plotting the performance object with the specifications "tpr", "fpr" gives me a ROC curve.

I'm comparing models at certain thresholds of false positive rate (x). I'm hoping to get the value of the true positive rate (y) out of the performance object. Even more, I would like to get the class percentage threshold that was used to generate that point.

the index number of the false positive rate (
`x-value`
) that is closest to the threshold without being above it, should give me the index number of the appropriate true positive rate (
`y-value`
). I'm not exactly sure how to get that index value.

And more to the point, how do i get the threshold of class probabilities that was used to make that point?

This is why `str` is my favorite R function:

``````library(ROCR)
data(ROCR.simple)
pred <- prediction( ROCR.simple\$predictions, ROCR.simple\$labels)
perf <- performance(pred,"tpr","fpr")
plot(perf)
> str(perf)
Formal class 'performance' [package "ROCR"] with 6 slots
..@ x.name      : chr "False positive rate"
..@ y.name      : chr "True positive rate"
..@ alpha.name  : chr "Cutoff"
..@ x.values    :List of 1
.. ..\$ : num [1:201] 0 0 0 0 0.00935 ...
..@ y.values    :List of 1
.. ..\$ : num [1:201] 0 0.0108 0.0215 0.0323 0.0323 ...
..@ alpha.values:List of 1
.. ..\$ : num [1:201] Inf 0.991 0.985 0.985 0.983 ...
``````

Ahah! It's an S4 class, so we can use `@` to access the slots. Here's how you make a `data.frame`:

``````cutoffs <- data.frame(cut=perf@alpha.values[[1]], fpr=perf@x.values[[1]],
tpr=perf@y.values[[1]])
cut         fpr        tpr
1       Inf 0.000000000 0.00000000
2 0.9910964 0.000000000 0.01075269
3 0.9846673 0.000000000 0.02150538
4 0.9845992 0.000000000 0.03225806
5 0.9834944 0.009345794 0.03225806
6 0.9706413 0.009345794 0.04301075
``````

If you have an fpr threshold you want to hit, you can subset this `data.frame` to find maximum tpr below this fpr threshold:

``````cutoffs <- cutoffs[order(cutoffs\$tpr, decreasing=TRUE),]