CCID - 1 year ago 74
R Question

# linear model with `lm`: how to get prediction variance of sum of predicted values

I'm summing the predicted values from a linear model with multiple predictors, as in the example below, and want to calculate the combined variance, standard error and possibly confidence intervals for this sum.

``````lm.tree <- lm(Volume ~ poly(Girth,2), data = trees)
``````

Suppose I have a set of
`Girths`
:

``````newdat <- list(Girth = c(10,12,14,16)
``````

for which I want to predict the total
`Volume`
:

``````pr <- predict(lm.tree, newdat, se.fit = TRUE)
total <- sum(pr\$fit)
# [1] 111.512
``````

How can I obtain the variance for
`total`
?

Similar questions are here (for GAMs), but I'm not sure how to proceed with the
`vcov(lm.trees)`
. I'd be grateful for a reference for the method.

Basically you need to obtain full variance-covariance matrix, then sum all its elements. Here is small proof:

The proof here is using another theorem, which you can find from Covariance-wikipedia:

Specifically, the linear transform we take is a column matrix of all 1's. The resulting quadratic form is computed as following, with all "a_ij" being 1.

# Obtain full variance-covariance matrix for prediction

The most trivial way to obtain variance-covariance matrix, is based on `vcov`. For your fitted model `lm.tree`, prediction data `newdat` and prediction matrix `Xp`:

``````## your model
lm.tree <- lm(Volume ~ poly(Girth,2), data = trees)

## newdata
newdat <- list(Girth = c(10,12,14,16))

## prediction matrix
poly_Xp <- poly(unlist(newdat), degree = 2,
coefs = attr(lm.tree\$model\$"poly(Girth, 2)", "coefs"),
simple = TRUE)
Xp <- cbind(1, poly_Xp)

#                 1          2
#[1,] 1 -0.18898835  0.1098920
#[2,] 1 -0.07263008 -0.1036759
#[3,] 1  0.04372819 -0.1776927
#[4,] 1  0.16008645 -0.1121585
``````

We can do:

``````VCOV1 <- Xp %*% vcov(lm.tree) %*% t(Xp)
#            [,1]      [,2]       [,3]       [,4]
#[1,]  0.89022938 0.3846809 0.04967582 -0.1147858
#[2,]  0.38468089 0.5369327 0.52828797  0.3587467
#[3,]  0.04967582 0.5282880 0.73113553  0.6582185
#[4,] -0.11478583 0.3587467 0.65821848  0.7836294
``````

But a much more efficient way is via my function:

``````fast_vcov <- function (lmObject, Xp, diag = TRUE) {
## compute Qt
QR <- lmObject\$qr   ## qr object of fitted model
piv <- QR\$pivot    ## pivoting index
r <- QR\$rank    ## model rank / numeric rank
if (is.unsorted(piv)) {
## pivoting has been done
Qt <- forwardsolve(t(QR\$qr), t(Xp[, piv]), r)
} else {
## no pivoting is done
Qt <- forwardsolve(t(QR\$qr), t(Xp), r)
}
## residual variance
sigma2 <- sum(residuals(lmObject)^2) / df.residual(lmObject)
## return
if (diag) {
## return point-wise prediction variance
## no need to compute full hat matrix
return(colSums(Qt ^ 2) * sigma2)
} else {
## return full variance-covariance matrix of predicted values
return(crossprod(Qt) * sigma2)
}
}
``````

With `diag = TRUE`, only diagonals of the variance-covariance are returned. These are point-wise variance, sufficient to produce point-wise confidence or prediction interval. Note, computing these diagonals do not need explicit formation of full hat matrix.

With `diag = FALSE`, full hat matrix will be returned by a more expensive matrix multiplication. But it is still faster than the trivial way based on `vcov`.

Here, we want:

``````## full variance-covariance matrix
VCOV <- fast_vcov(lm.tree, Xp, diag = FALSE)
#            [,1]      [,2]       [,3]       [,4]
#[1,]  0.89022938 0.3846809 0.04967582 -0.1147858
#[2,]  0.38468089 0.5369327 0.52828797  0.3587467
#[3,]  0.04967582 0.5282880 0.73113553  0.6582185
#[4,] -0.11478583 0.3587467 0.65821848  0.7836294
``````

As you can see, it is exactly as same as `VCOV1` obtained from the trivial method.

Obtaining predicted values is easy:

``````pred <- drop(Xp %*% coef(lm.tree))
``````

In case you feel interested, you can compare those values with what is returned by `predict.lm`:

``````pr <- predict(lm.tree, newdat, se.fit = TRUE)

pr\$fit
#       1        2        3        4
#15.31863 22.33400 31.38568 42.47365

pred
#       1        2        3        4
#15.31863 22.33400 31.38568 42.47365

pr\$se.fit
#        1         2         3         4
#0.9435197 0.7327569 0.8550646 0.8852284

sqrt(diag(VCOV))
# 0.9435197 0.7327569 0.8550646 0.8852284
``````

So, we are definitely doing correct.

Finally, the sum of all predicted values is

``````sum(pred)
# 111.512
``````

with variance

``````sum(VCOV)
# 6.671575
``````

# Closing Note

Note that in above, prediction matrix `Xp` is not constructed via `with(newdat, poly(Girth, 2))` or `model.matrix(~poly(Girth, 2), newdat)`, as we don't want to construct a new polynomial basis; rather, we want to evaluate the polynomial basis used for model fitting at new covariate values.

It is because of this, it can be quite a pain. `predict.lm` does not support the return of linear predictor matrix. Some advanced packages, like `mgcv`, supports `type = "lpmatrix"` in `predict`. Why is this helpful? As mentioned above, when we have smooth functions in the model, prediction matrix can not be constructed via `model.matrix`. This is not only true for `poly` but also for `splines::bs`. It is a pain in the ass for user to construct those prediction matrix manually. For example, if we've fitted a model like:

``````y ~ poly(x1, 2) + bs(x2, 5) + x3 + x4 : f
``````

where `x1` to `x4` are numeric covariates and `f` is a factor. When we make prediction, prediction matrix for the "linear" part `~ x3 + x4:f` can be constructed safely via `model.matrix`, while the "non-linear" part: `~ poly(x1, 2) + bs(x2, 5)` should be obtained by predict methods for `poly` and `bs`. Imagine how difficult it is for us user, to manually generate prediction matrix for different class of smooth terms as well as linear terms, then column bind them together as the final matrix!!

Note my function `fast_vcov` takes argument `Xp`. I wish I could feed it with a `newdata` argument, but I could not find an easy way to automatically generate all prediction matrices correctly inside the function. So, it becomes user's responsibility to obtain the right prediction matrix themselves.

Note, even if we want to use the trivial method: `Xp %*% vcov(lmObject) %*% t(Xp)`, we still need `Xp`. So either way, we need to struggle a bit.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download