ambroise - 1 year ago 58

R Question

I am working on evaluating a screening test for osteoporosis, and I have a large set of data where we measured values of bone density. We classified individuals as being 'disease positive' for osteoporosis if they had a vertebral fracture present on the images when we took the bone density measure.

The 'disease positive' has a lower distribution of the continuous value than the disease negative group.

We want to determine which threshold for the continuous variable is best for determining if an individual is at a higher risk for future fractures. We've found that the lower the value is, the higher the risk. I used Stata to create some tables to calculate sensitivity and specificity at a few different thresholds. Again, a person is 'test positive' if their value is *below* the threshold. I made this table here:

We wanted to show this in graphical form, so I decided to make an ROC curve, and I used the ROCR package to do so. Here is the code I used in R:

`library(ROCR)`

prevalentfx <- read.csv("prevalentfxnew.csv", header = TRUE)

pred <- prediction(prevalentfx$l1_hu, prevalentfx$fx)

perf <- performance(pred, "tpr", "fpr")

plot(perf, print.cutoffs.at = c(50,90,110,120), points.pch = 20, points.col = "darkblue",

text.adj=c(1.2,-0.5))

And here is what comes out:

Not what I expected!

This didn't make sense to me because according to the few thresholds where I calculated sensitivity and specificity manually (in the table), 50 HU is the

I figured this issue was due to the fact that a person is 'test positive' if the value is

Is there something fundamentally wrong with how I'm looking at this? I have double checked our data several times to make sure I wasn't miscalculating the sensitivity and specificity values, and it all looks right. Thanks.

EDIT:

Here is a working example:

`library(ROCR)`

low <- rnorm(200, mean = 73, sd = 42)

high<- rnorm(3000, mean = 133, sd = 51.5)

measure <- c(low, high)

df = data.frame(measure)

df$fx <- rep.int(1, 200)

df$fx[201:3200] <- rep.int(0,3000)

pred <- prediction(df$measure, df$fx)

perf <- performance(pred, "tpr", "fpr")

plot(perf,print.cutoffs.at=c(50,90,110,120), points.pch = 20, points.col = "darkblue",

text.adj=c(1.2,-0.5))

Answer Source

The easiest solution (although inelegant) might be to use the negative values (rather than reversing your classification):

```
pred <- prediction(-df$measure, df$fx)
perf <- performance(pred, "tpr", "fpr")
plot(perf,
print.cutoffs.at=-c(50,90,110,120),
cutoff.label.function=`-`,
points.pch = 20, points.col = "darkblue",
text.adj=c(1.2,-0.5))
```