Toni - 1 year ago 72
R Question

# Predicting values in OLS linear model with indexed explanatory variable (years)

This is one of those questions for which there are probably a million ways to go about it that would make the actual answer irrelevant, but stubbornness gets in the way...

In trying to understand the application of time series, it is clear that de-trending the data makes predicting future values implausible. For instance, using the

`gtemp`
dataset from the
`astsa`
package, the trend upward in the past decades needs to be factored in:

So I end up with an ARIMA model (right or wrong) for the de-trended data, which allows me to "forecast" 10 years ahead:

``````fit = arima(gtemp, order = c(4, 1, 1))
pred = predict(fit, n.ahead = 10)
``````

and an OLS trend estimator based on the values since 1950:

``````gtemp1 = window(gtemp, start = 1950, end = 2009)
fit2 = lm(gtemp1 ~ I(1950:2009))
``````

The issue is how to use
`predict()`
to get the estimated value for the linear model part in the next 10 years.

If I run
`predict(fit2, data.frame(I(2010:2019)))`
I get the 60 values that I would get just running
`predict(fit2)`
, plus an error message:
`'newdata' had 10 rows but variables found have 60 rows`
.

Answer Source

You need:

``````dat <- data.frame(year = 1950:2009, gtemp1 = as.numeric(gtemp1))
fit2 <- lm(gtemp1 ~ year, data = dat)
unname( predict(fit2, newdata = data.frame(year = 2010:2019)) )
# [1] 0.4928475 0.5037277 0.5146079 0.5254882 0.5363684 0.5472487 0.5581289
# [8] 0.5690092 0.5798894 0.5907697
``````

Alternatively if you don't want to use `data` argument in `lm`, you need:

``````year <- 1950:2009
fit2 <- lm(gtemp1 ~ year)
unname( predict(fit2, newdata = data.frame(year = 2010:2019)) )
# [1] 0.4928475 0.5037277 0.5146079 0.5254882 0.5363684 0.5472487 0.5581289
# [8] 0.5690092 0.5798894 0.5907697
``````

Why your original code fails

When you do `fit2 <- lm(gtemp1 ~ I(1950:2009))`, `lm` is assuming there is a covariate called `I(1950:2009)`:

``````attr(fit2\$terms, "term.labels")  ## names of covariates
# [1] "I(1950:2009)"
``````

When you make prediction later, `predict` will aim to find a variable in your new data frame, called `I(1950:2009)`. However, have a look at the column names of your `newdata`:

``````colnames( data.frame(I(2010:2019)) )
# [1] "X2010.2019"
``````

As a result, `predict.lm` can not find variable `I(1950:2009)` in `newdata`, then it would use the internal model frame `fit2\$model` as `newdata`, and return fitted values by default (which explains why you get 60 values).

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download