I am befuddled by the format to perform a simple prediction using R's
lung.surv <- survfit(Surv(time,status) ~ 1, data = lung)
lung.reg <- survreg(Surv(time,status) ~ 1, data = lung, dist="exponential")
myPredict400 <- predict(lung.reg, newdata=data.frame(time=400), type="response")
The point with this survival function is to find an empirical distribution that fits the survival times. Essentially you are associating a survival time with a probability. Once you have that distribution, you can pick out the survival rate for a given time.
library(survival) lung.reg <- survreg(Surv(time,status) ~ 1, data = lung) # because you want a distribution pct <- 1:99/100 # this creates the empirical survival probabilities myPredict400 <- predict(lung.reg, newdata=data.frame(time=400),type='quantile', p=pct) indx = which(abs(myPredict400 - 400) == min(abs(myPredict400 - 400))) # find the closest survival time to 400 print(1 - pct[indx]) # 0.39
Straight from the help docs, here's a plot of it:
matplot(myPredict400, 1-pct, xlab="Months", ylab="Survival", type='l', lty=c(1,2,2), col=1)
You're basically fitting a regression to a distribution of probabilities (hence 1...99 out of 100). If you make it go to 100, then the last value of your prediction is
inf because the survival rate in the 100th percentile is infinite. This is what the
pct arguments do.
For example, setting
pct = 1:999/1000 you get much more precise values for the prediction (
myPredict400). Also, if you set
pct to be some value that's not a proper probability (i.e. less than 0 or more than 1) you'll get an error. I suggest you play with these values and see how they impact your survival rates.