rfsrc - 6 months ago 52

R Question

I am working with some log-normal data, and naturally I want to demonstrate log-normal distribution results in a better overlap than other possible distributions. Essentially, I want to replicate the following graph with my data:

where the fitted density curves are juxtaposed over

`log(time)`

The text where the linked image is from describes the process as fitting each model and obtaining the following parameters:

For that purpose, I fitted four naive survival models with the above-mentioned distributions:

`survreg(Surv(time,event)~1,dist="family")`

and extracted the shape parameter (α) and the coefficient (β).

I have several questions regarding the process:

1) Is this the right way of going about it? I have looked into several R packages but couldn't locate one that plots density curves as a built-in function, so I feel like I must be overlooking something obvious.

2) Do the values corresponding log-normal distribution (μ and σ$^2$) just the mean and the variance of the intercept?

3) How can I create a similar table in R? (Maybe this is more of a stack overflow question) I know I can just

`cbind`

`survreg`

`survreg.obj$coefficients`

4) Most importantly, how can I plot a similar graph? I thought it would be fairly simple if I just extract the parameters and plot them over the histrogram, but so far no luck. The author of the text says he estimated the density curves from the parameters, but I just get a point estimate - what am I missing? Should I calculate the density curves manually based on distribution before plotting?

I am not sure how to provide a mwe in this case, but honestly I just need a general solution for adding multiple density curves to survival data. On the other hand, if you think it will help, feel free to recommend a mwe solution and I will try to produce one.

Thanks for your input!

Edit: Based on eclark's post, I have made some progress. My parameters are:

`Dist = data.frame(`

Exponential = rweibull(n = 10000, shape = 1, scale = 6.636684),

Weibull = rweibull(n = 10000, shape = 6.068786, scale = 2.002165),

Gamma = rgamma(n = 10000, shape = 768.1476, scale = 1433.986),

LogNormal = rlnorm(n = 10000, meanlog = 4.986, sdlog = .877)

)

However, given the massive difference in scales, this is what I get:

Going back to question number 3, is this how I should get the parameters?

Currently this is how I do it (sorry for the mess):

`summary(fit.exp)`

Call:

survreg(formula = Surv(duration, confterm) ~ 1, data = data.na,

dist = "exponential")

Value Std. Error z p

(Intercept) 6.64 0.052 128 0

Scale fixed at 1

Exponential distribution

Loglik(model)= -2825.6 Loglik(intercept only)= -2825.6

Number of Newton-Raphson Iterations: 6

n= 397

summary(fit.wei)

Call:

survreg(formula = Surv(duration, confterm) ~ 1, data = data.na,

dist = "weibull")

Value Std. Error z p

(Intercept) 6.069 0.1075 56.5 0.00e+00

Log(scale) 0.694 0.0411 16.9 6.99e-64

Scale= 2

Weibull distribution

Loglik(model)= -2622.2 Loglik(intercept only)= -2622.2

Number of Newton-Raphson Iterations: 6

n= 397

summary(fit.gau)

Call:

survreg(formula = Surv(duration, confterm) ~ 1, data = data.na,

dist = "gaussian")

Value Std. Error z p

(Intercept) 768.15 72.6174 10.6 3.77e-26

Log(scale) 7.27 0.0372 195.4 0.00e+00

Scale= 1434

Gaussian distribution

Loglik(model)= -3243.7 Loglik(intercept only)= -3243.7

Number of Newton-Raphson Iterations: 4

n= 397

summary(fit.log)

Call:

survreg(formula = Surv(duration, confterm) ~ 1, data = data.na,

dist = "lognormal")

Value Std. Error z p

(Intercept) 4.986 0.1216 41.0 0.00e+00

Log(scale) 0.877 0.0373 23.5 1.71e-122

Scale= 2.4

Log Normal distribution

Loglik(model)= -2624 Loglik(intercept only)= -2624

Number of Newton-Raphson Iterations: 5

n= 397

I feel like I am particularly messing up the lognormal, given that it is not the standard shape-and-coefficient tandem but the mean and variance.

Answer

Try this; the idea is generating random variables using the random distribtion functions and then plotting the density functions with the output data, here is an example like you need:

```
require(ggplot2)
require(dplyr)
require(tidyr)
SampleData <- data.frame(Duration=rlnorm(n = 184,meanlog = 2.859,sdlog = .246)) #Asume this is data we have sampled from a lognormal distribution
#Then we estimate the parameters for different types of distributions for that sample data and come up for this parameters
#We then generate a dataframe with those distributions and parameters
Dist = data.frame(
Weibull = rweibull(10000,shape = 1.995,scale = 22.386),
Gamma = rgamma(n = 10000,shape = 4.203,scale = 4.699),
LogNormal = rlnorm(n = 10000,meanlog = 2.859,sdlog = .246)
)
#We use gather to prepare the distribution data in a manner better suited for group plotting in ggplot2
Dist <- Dist %>% gather(Distribution,Duration)
#Create the plot that sample data as a histogram
G1 <- ggplot(SampleData,aes(x=Duration)) + geom_histogram(aes(,y=..density..),binwidth=5, colour="black", fill="white")
#Add the density distributions of the different distributions with the estimated parameters
G2 <- G1 + geom_density(aes(x=Duration,color=Distribution),data=Dist)
plot(G2)
```