Ira Saktor Ira Saktor - 1 month ago 6
R Question

Performance of DIckey Fuller test on nearly non-stationary series

when applying an (augmented) dickey fuller test on nearly non-stationary series (i.e. y_t = 0.97*y_{t-1} + e_t), the test should - from what I have read - perform quite poorly.

Yet when I apply this test to sufficiently long series (1000 time periods), i get very accurate results. When I perform the test on shorter series (100 time periods) the results are indeed poor.

Is there some way how to quantify the impact of sample size on DF test?

Or is my code wrong and the test should indeed perform poorly?

For T = 100 indeed poor performance:

# set seed for reproducibility
set.seed(3)

# number of repetitions
repetitions <- 100

# length of time series
time <- 100

# generate matrices of 100 nearly non stationary processes of length 100
unitRootMatrix <- NULL
nearlyNonStationaryMatrix <- NULL
for (i in 1:repetitions)
{
# create vector of iid random errors
errors2 <- rnorm(time, mean = 0, sd = 2)
# create placeholders for processes
temp2 <- rep(0, time + 1)
for (j in 2:time)
{
temp2[j] = 0.95*temp2[j-1] + errors2[j - 1]
}

# bind recent process to previous processes
nearlyNonStationaryMatrix <- cbind(nearlyNonStationaryMatrix, temp2)
}

# Augmented Dickey Fuller test
library(tseries)

temp <- NULL
pvals <- NULL
for (i in 1:ncol(nearlyNonStationaryMatrix))
{
temp <- adf.test(nearlyNonStationaryMatrix[, i], )$p.value
pvals <- c(pvals, temp)
}

sum(pvals > 0.1)
## [1] 83
sum(pvals > 0.05)
## [1] 90
sum(pvals > 0.01)
## [1] 98


for T = 1000 not so much:

# set seed for reproducibility
set.seed(3)

# number of repetitions
repetitions <- 100

# length of time series
time <- 1000

# generate matrices of 100 nearly non stationary processes of length 1000
unitRootMatrix <- NULL
nearlyNonStationaryMatrix <- NULL
for (i in 1:repetitions)
{
# create vector of iid random errors
errors2 <- rnorm(time, mean = 0, sd = 2)
# create placeholders for processes
temp2 <- rep(0, time + 1)
for (j in 2:time)
{
temp2[j] = 0.95*temp2[j-1] + errors2[j - 1]
}

# bind recent process to previous processes
nearlyNonStationaryMatrix <- cbind(nearlyNonStationaryMatrix, temp2)
}

# Augmented Dickey Fuller test
library(tseries)

temp <- NULL
pvals <- NULL
for (i in 1:ncol(nearlyNonStationaryMatrix))
{
temp <- adf.test(nearlyNonStationaryMatrix[, i], )$p.value
pvals <- c(pvals, temp)
}

sum(pvals > 0.1)
## [1] 0
sum(pvals > 0.05)
## [1] 0
sum(pvals > 0.01)
## [1] 5


Any thoughts, comments will be much appreciated! :)

Answer

ADF is known to perform poorly against near-unit-root alternatives, it is, nevertheless, consistent.

set.seed(3)
library(tseries)
repetitions <- 10000
sim <- function(time){
  x <- replicate(repetitions , arima.sim(list(ar=0.95), time, innov=rnorm(time, mean=0, sd=2))) #near unit root
  pvals <- rep(0, repetitions)
  for (i in 1:ncol(x))
  {
    pvals[i] <- adf.test(x[, i], )$p.value
  }
  pvals
}
sample_sizes <- seq(100, 1000, by=100)
res <- lapply(sample_sizes, sim)
setNames(colSums(sapply(res, `<=`, 0.01))/repetitions, paste0("T=", sample_sizes))
#  T=100  T=200  T=300  T=400  T=500  T=600  T=700  T=800  T=900 T=1000 
# 0.0126 0.0319 0.0716 0.1538 0.2835 0.4207 0.6130 0.7400 0.8580 0.9322 

This gives a power of the test as a function of the sample size.