Ira Saktor - 7 months ago 38

R Question

when applying an (augmented) dickey fuller test on nearly non-stationary series (i.e. y_t = 0.97*y_{t-1} + e_t), the test should - from what I have read - perform quite poorly.

Yet when I apply this test to sufficiently long series (1000 time periods), i get very accurate results. When I perform the test on shorter series (100 time periods) the results are indeed poor.

Is there some way how to quantify the impact of sample size on DF test?

Or is my code wrong and the test should indeed perform poorly?

For T = 100 indeed poor performance:

`# set seed for reproducibility`

set.seed(3)

# number of repetitions

repetitions <- 100

# length of time series

time <- 100

# generate matrices of 100 nearly non stationary processes of length 100

unitRootMatrix <- NULL

nearlyNonStationaryMatrix <- NULL

for (i in 1:repetitions)

{

# create vector of iid random errors

errors2 <- rnorm(time, mean = 0, sd = 2)

# create placeholders for processes

temp2 <- rep(0, time + 1)

for (j in 2:time)

{

temp2[j] = 0.95*temp2[j-1] + errors2[j - 1]

}

# bind recent process to previous processes

nearlyNonStationaryMatrix <- cbind(nearlyNonStationaryMatrix, temp2)

}

# Augmented Dickey Fuller test

library(tseries)

temp <- NULL

pvals <- NULL

for (i in 1:ncol(nearlyNonStationaryMatrix))

{

temp <- adf.test(nearlyNonStationaryMatrix[, i], )$p.value

pvals <- c(pvals, temp)

}

sum(pvals > 0.1)

## [1] 83

sum(pvals > 0.05)

## [1] 90

sum(pvals > 0.01)

## [1] 98

for T = 1000 not so much:

`# set seed for reproducibility`

set.seed(3)

# number of repetitions

repetitions <- 100

# length of time series

time <- 1000

# generate matrices of 100 nearly non stationary processes of length 1000

unitRootMatrix <- NULL

nearlyNonStationaryMatrix <- NULL

for (i in 1:repetitions)

{

# create vector of iid random errors

errors2 <- rnorm(time, mean = 0, sd = 2)

# create placeholders for processes

temp2 <- rep(0, time + 1)

for (j in 2:time)

{

temp2[j] = 0.95*temp2[j-1] + errors2[j - 1]

}

# bind recent process to previous processes

nearlyNonStationaryMatrix <- cbind(nearlyNonStationaryMatrix, temp2)

}

# Augmented Dickey Fuller test

library(tseries)

temp <- NULL

pvals <- NULL

for (i in 1:ncol(nearlyNonStationaryMatrix))

{

temp <- adf.test(nearlyNonStationaryMatrix[, i], )$p.value

pvals <- c(pvals, temp)

}

sum(pvals > 0.1)

## [1] 0

sum(pvals > 0.05)

## [1] 0

sum(pvals > 0.01)

## [1] 5

Any thoughts, comments will be much appreciated! :)

Answer

ADF is known to perform poorly against near-unit-root alternatives, it is, nevertheless, consistent.

```
set.seed(3)
library(tseries)
repetitions <- 10000
sim <- function(time){
x <- replicate(repetitions , arima.sim(list(ar=0.95), time, innov=rnorm(time, mean=0, sd=2))) #near unit root
pvals <- rep(0, repetitions)
for (i in 1:ncol(x))
{
pvals[i] <- adf.test(x[, i], )$p.value
}
pvals
}
sample_sizes <- seq(100, 1000, by=100)
res <- lapply(sample_sizes, sim)
setNames(colSums(sapply(res, `<=`, 0.01))/repetitions, paste0("T=", sample_sizes))
# T=100 T=200 T=300 T=400 T=500 T=600 T=700 T=800 T=900 T=1000
# 0.0126 0.0319 0.0716 0.1538 0.2835 0.4207 0.6130 0.7400 0.8580 0.9322
```

This gives a power of the test as a function of the sample size.

Source (Stackoverflow)