VINAY S G - 8 months ago 184

Python Question

I am trying to model a time series data using ARIMA modelling in python. I used the function

`statsmodels.tsa.stattools.arma_order_select_ic`

`dates=pd.date_range('2010-11-1','2011-01-30')`

dataseries=Series([22,624,634,774,726,752,38,534,722,678,750,690,686,26,708,606,632,632,632,584,28,576,474,536,512,464,436,24,448,408,528,

602,638,640,26,658,548,620,534,422,482,26,616,612,622,598,614,614,24,644,506,522,622,526,26,22,738,582,592,408,466,568,

44,680,652,598,642,714,562,38,778,796,742,460,610,42,38,732,650,670,618,574,42,22,610,456,22,630,408,390,24],index=dates)

df=pd.DataFrame({'Consumption':dataseries})

df

sm.tsa.arma_order_select_ic(df, max_ar=4, max_ma=2, ic='aic')

The Result is as follow,

`{'aic': 0 1 2`

0 1262.244974 1264.052640 1264.601342

1 1264.098325 1261.705513 1265.604662

2 1264.743786 1265.015529 1246.347400

3 1265.427440 1266.378709 1266.430373

4 1266.358895 1267.674168 NaN, 'aic_min_order': (2, 2)}

But when I use Augumented Dickey Fuller test, the test result shows that the series is not stationary.

`d_order0=sm.tsa.adfuller(dataseries)`

print 'adf: ', d_order0[0]

print 'p-value: ', d_order0[1]

print'Critical values: ', d_order0[4]

if d_order0[0]> d_order0[4]['5%']:

print 'Time Series is nonstationary'

print d

else:

print 'Time Series is stationary'

print d

Output is as follow,

`adf: -1.96448506629`

p-value: 0.302358888762

Critical values: {'5%': -2.8970475206326833, '1%': -3.5117123057187376, '10%': -2.5857126912469153}

Time Series is nonstationary

1

When I cross verified the results with R, it showed that the default series is stationary. Then why did the augumented dickey fuller test result in non stationary series?

Answer

Clearly you have some seasonality in your data. Then arma models and stationarity tests need to be carefully done.

Apparently, the reason for the difference in adf test between python and R is the number of default lags each software uses.

```
> (nobs=length(dataseries))
[1] 91
> 12*(nobs/100)^(1/4) #python default
[1] 11.72038
> trunc((nobs-1)^(1/3)) #R default
[1] 4
> acf(coredata(dataseries),plot = F)
Autocorrelations of series ‘coredata(dataseries)’, by lag
0 1 2 3 4 5 6 7 8 9 10 11
1.000 0.039 -0.116 -0.124 -0.094 -0.148 0.083 0.645 -0.072 -0.135 -0.138 -0.146
12 13 14 15 16 17 18 19
-0.185 0.066 0.502 -0.097 -0.151 -0.165 -0.195 -0.160
> adf.test(dataseries,k=12)
Augmented Dickey-Fuller Test
data: dataseries
Dickey-Fuller = -2.6172, Lag order = 12, p-value = 0.322
alternative hypothesis: stationary
> adf.test(dataseries,k=4)
Augmented Dickey-Fuller Test
data: dataseries
Dickey-Fuller = -6.276, Lag order = 4, p-value = 0.01
alternative hypothesis: stationary
Warning message:
In adf.test(dataseries, k = 4) : p-value smaller than printed p-value
> adf.test(dataseries,k=7)
Augmented Dickey-Fuller Test
data: dataseries
Dickey-Fuller = -2.2571, Lag order = 7, p-value = 0.4703
alternative hypothesis: stationary
```