user918967 user918967 - 9 months ago 57
R Question

Format in R for point prediction of survival analysis

I am befuddled by the format to perform a simple prediction using R's


lung.surv <- survfit(Surv(time,status) ~ 1, data = lung)

So fitting a simple exponential regression (for example purposes only) is:

lung.reg <- survreg(Surv(time,status) ~ 1, data = lung, dist="exponential")

How would I predict the percent survival at time=400?

When I use the following:

myPredict400 <- predict(lung.reg, newdata=data.frame(time=400), type="response")

I get the following:


I was expecting something like 37% so I am missing something pretty obvious


The point with this survival function is to find an empirical distribution that fits the survival times. Essentially you are associating a survival time with a probability. Once you have that distribution, you can pick out the survival rate for a given time.

Try this:

lung.reg <- survreg(Surv(time,status) ~ 1, data = lung)  # because you want a distribution

pct <- 1:99/100  # this creates the empirical survival probabilities
myPredict400 <- predict(lung.reg, newdata=data.frame(time=400),type='quantile', p=pct)

indx = which(abs(myPredict400 - 400) == min(abs(myPredict400 - 400))) # find the closest survival time to 400
print(1 - pct[indx]) # 0.39

Straight from the help docs, here's a plot of it:

matplot(myPredict400, 1-pct, xlab="Months", ylab="Survival", type='l', lty=c(1,2,2), col=1)

enter image description here


You're basically fitting a regression to a distribution of probabilities (hence 1...99 out of 100). If you make it go to 100, then the last value of your prediction is inf because the survival rate in the 100th percentile is infinite. This is what the quantile and pct arguments do.

For example, setting pct = 1:999/1000 you get much more precise values for the prediction (myPredict400). Also, if you set pct to be some value that's not a proper probability (i.e. less than 0 or more than 1) you'll get an error. I suggest you play with these values and see how they impact your survival rates.