ambroise ambroise - 1 year ago 102
R Question

Issue with ROC curve where 'test positive' is below a certain threshold

I am working on evaluating a screening test for osteoporosis, and I have a large set of data where we measured values of bone density. We classified individuals as being 'disease positive' for osteoporosis if they had a vertebral fracture present on the images when we took the bone density measure.

The 'disease positive' has a lower distribution of the continuous value than the disease negative group.

We want to determine which threshold for the continuous variable is best for determining if an individual is at a higher risk for future fractures. We've found that the lower the value is, the higher the risk. I used Stata to create some tables to calculate sensitivity and specificity at a few different thresholds. Again, a person is 'test positive' if their value is below the threshold. I made this table here:

We wanted to show this in graphical form, so I decided to make an ROC curve, and I used the ROCR package to do so. Here is the code I used in R:

prevalentfx <- read.csv("prevalentfxnew.csv", header = TRUE)

pred <- prediction(prevalentfx$l1_hu, prevalentfx$fx)
perf <- performance(pred, "tpr", "fpr")

plot(perf, = c(50,90,110,120), points.pch = 20, points.col = "darkblue",

And here is what comes out:
Not what I expected!

This didn't make sense to me because according to the few thresholds where I calculated sensitivity and specificity manually (in the table), 50 HU is the least sensitive threshold and 120 is the most sensitive. Additionally, I feel like the curve is flipped along the diagonal axis. I know that this test is not that poor.

I figured this issue was due to the fact that a person is 'test positive' if the value is below the threshold, not above them. So, I just created a new vector of values where I flipped the binary classification and re-created the ROC plot, and got a figure which aligns much better with the data. However, the threshold values are still opposite of what they should be.

Is there something fundamentally wrong with how I'm looking at this? I have double checked our data several times to make sure I wasn't miscalculating the sensitivity and specificity values, and it all looks right. Thanks.


Here is a working example:


low <- rnorm(200, mean = 73, sd = 42)
high<- rnorm(3000, mean = 133, sd = 51.5)

measure <- c(low, high)
df = data.frame(measure)

df$fx <-, 200)
df$fx[201:3200] <-,3000)

pred <- prediction(df$measure, df$fx)
perf <- performance(pred, "tpr", "fpr")

plot(perf,,90,110,120), points.pch = 20, points.col = "darkblue",

Answer Source

The easiest solution (although inelegant) might be to use the negative values (rather than reversing your classification):

pred <- prediction(-df$measure, df$fx)
perf <- performance(pred, "tpr", "fpr")
     points.pch = 20, points.col = "darkblue", 

enter image description here

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download