Lyndz - 1 year ago 106

R Question

I have a time series of rainfall values in a csv file.I plotted the histogram of the data. The histogram is skewed to the left. I wanted to transform the values so that it will have a normal distribution. I used the Yeo-Johnson transform available in R. The transformed values are here.

My question is:

In the above transformation, I used a test value of 0.5 for lambda, which works fine. Is there away to determine the optimal value of lambda based from the time series? I'll appreciate any suggestions.

So far, here's the code:

`library(car)`

dat <- scan("Zamboanga.csv")

hist(dat)

trans <- yjPower(dat,0.5,jacobian.adjusted=TRUE)

hist(trans)

Here is the csv file.

Answer Source

First find the optimal lambda by using the function `boxCox`

from the car package to estimate λ by maximum likelihood.

You can plot it like this:

```
boxCox(your_model, family="yjPower", plotit = TRUE)
```

As Ben Bolker said in a comment, the model here could be something like

```
your_model <- lm(dat~1)
```

Then use the optimized lambda in your existing code.