I'm new to time series and used the monthly ozone concentration data from Rob Hyndman's website to do some forecasting.
After doing a log transformation and differencing by lags 1 and 12 to get rid of the trend and seasonality respectively, I plotted the ACF and PACF shown [in this image]. Am I on the right track and how would I interpret this as a SARIMA?
There seems to be a pattern every 11 lags in the PACF plot, which makes me think I should do more differencing (at 11 lags), but doing so gives me a worse plot.
I'd really appreciate any of your help!
I got rid of the differencing at lag 1 and just used lag 12 instead, and this is what I got for the ACF and PACF.
From there, I deduced that: SARIMA(1,0,1)x(1,1,1) (AIC: 520.098)
or SARIMA(1,0,1)x(2,1,1) (AIC: 521.250)
would be a good fit, but auto.arima gave me (3,1,1)x(2,0,0) (AIC: 560.7) normally and (1,1,1)x(2,0,0) (AIC: 558.09) without stepwise and approximation.
I am confused on which model to use, but based on the lowest AIC, SAR(1,0,1)x(1,1,1) would be the best? Also, the thing that concerns me is that none of the models pass the Ljung-Box test. Is there any way I can fix this?
It is quite difficult to manually select a model order that will perform well at forecasting a dataset. This is why Rob has built the 'auto.arima' function in his R forecast package, to figure out the model that may perform best based on certain metrics.
When you see a pacf plot with significantly negative lags that usually means you have over differenced your data. Try removing the 1st order difference and keeping the 12 order difference. Then carry on making your best guess.
I'd recommend trying his auto.arima function and passing it a time series object with frequency = 12. He has a good writeup of seasonal arima models here:
If you would like more insight into manually selecting a SARIMA model order, this is a good read: