VINAY S G VINAY S G - 5 months ago 123
Python Question

Stationarity of a time series data

I am trying to model a time series data using ARIMA modelling in python. I used the function

statsmodels.tsa.stattools.arma_order_select_ic
on the default data series and got the values of p and q as 2,2 respectively. The code is as below,

dates=pd.date_range('2010-11-1','2011-01-30')
dataseries=Series([22,624,634,774,726,752,38,534,722,678,750,690,686,26,708,606,632,632,632,584,28,576,474,536,512,464,436,24,448,408,528,
602,638,640,26,658,548,620,534,422,482,26,616,612,622,598,614,614,24,644,506,522,622,526,26,22,738,582,592,408,466,568,
44,680,652,598,642,714,562,38,778,796,742,460,610,42,38,732,650,670,618,574,42,22,610,456,22,630,408,390,24],index=dates)
df=pd.DataFrame({'Consumption':dataseries})
df

sm.tsa.arma_order_select_ic(df, max_ar=4, max_ma=2, ic='aic')


The Result is as follow,

{'aic': 0 1 2
0 1262.244974 1264.052640 1264.601342
1 1264.098325 1261.705513 1265.604662
2 1264.743786 1265.015529 1246.347400
3 1265.427440 1266.378709 1266.430373
4 1266.358895 1267.674168 NaN, 'aic_min_order': (2, 2)}


But when I use Augumented Dickey Fuller test, the test result shows that the series is not stationary.

d_order0=sm.tsa.adfuller(dataseries)
print 'adf: ', d_order0[0]
print 'p-value: ', d_order0[1]
print'Critical values: ', d_order0[4]

if d_order0[0]> d_order0[4]['5%']:
print 'Time Series is nonstationary'
print d
else:
print 'Time Series is stationary'
print d


Output is as follow,

adf: -1.96448506629
p-value: 0.302358888762
Critical values: {'5%': -2.8970475206326833, '1%': -3.5117123057187376, '10%': -2.5857126912469153}
Time Series is nonstationary
1


When I cross verified the results with R, it showed that the default series is stationary. Then why did the augumented dickey fuller test result in non stationary series?

Answer

Clearly you have some seasonality in your data. Then arma models and stationarity tests need to be carefully done.

Apparently, the reason for the difference in adf test between python and R is the number of default lags each software uses.

> (nobs=length(dataseries))
[1] 91
> 12*(nobs/100)^(1/4)  #python default
[1] 11.72038
> trunc((nobs-1)^(1/3)) #R default
[1] 4
> acf(coredata(dataseries),plot = F)

Autocorrelations of series ‘coredata(dataseries)’, by lag

     0      1      2      3      4      5      6      7      8      9     10     11 
 1.000  0.039 -0.116 -0.124 -0.094 -0.148  0.083  0.645 -0.072 -0.135 -0.138 -0.146 
    12     13     14     15     16     17     18     19 
-0.185  0.066  0.502 -0.097 -0.151 -0.165 -0.195 -0.160 
> adf.test(dataseries,k=12)

    Augmented Dickey-Fuller Test

data:  dataseries
Dickey-Fuller = -2.6172, Lag order = 12, p-value = 0.322
alternative hypothesis: stationary

> adf.test(dataseries,k=4)

    Augmented Dickey-Fuller Test

data:  dataseries
Dickey-Fuller = -6.276, Lag order = 4, p-value = 0.01
alternative hypothesis: stationary

Warning message:
In adf.test(dataseries, k = 4) : p-value smaller than printed p-value
> adf.test(dataseries,k=7)

    Augmented Dickey-Fuller Test

data:  dataseries
Dickey-Fuller = -2.2571, Lag order = 7, p-value = 0.4703
alternative hypothesis: stationary