CCID CCID - 3 months ago 8
R Question

linear model with `lm`: how to get prediction variance of sum of predicted values

I'm summing the predicted values from a linear model with multiple predictors, as in the example below, and want to calculate the combined variance, standard error and possibly confidence intervals for this sum.

lm.tree <- lm(Volume ~ poly(Girth,2), data = trees)


Suppose I have a set of
Girths
:

newdat <- list(Girth = c(10,12,14,16)


for which I want to predict the total
Volume
:

pr <- predict(lm.tree, newdat, se.fit = TRUE)
total <- sum(pr$fit)
# [1] 111.512


How can I obtain the variance for
total
?

Similar questions are here (for GAMs), but I'm not sure how to proceed with the
vcov(lm.trees)
. I'd be grateful for a reference for the method.

Answer

Basically you need to obtain full variance-covariance matrix, then sum all its elements. Here is small proof:

enter image description here

The proof here is using another theorem, which you can find from Covariance-wikipedia:

enter image description here

Specifically, the linear transform we take is a column matrix of all 1's. The resulting quadratic form is computed as following, with all "a_ij" being 1.

enter image description here


Obtain full variance-covariance matrix for prediction

The most trivial way to obtain variance-covariance matrix, is based on vcov. For your fitted model lm.tree, prediction data newdat and prediction matrix Xp:

## your model
lm.tree <- lm(Volume ~ poly(Girth,2), data = trees)

## newdata
newdat <- list(Girth = c(10,12,14,16))

## prediction matrix
poly_Xp <- poly(unlist(newdat), degree = 2,
           coefs = attr(lm.tree$model$"poly(Girth, 2)", "coefs"),
           simple = TRUE)
Xp <- cbind(1, poly_Xp)

#                 1          2
#[1,] 1 -0.18898835  0.1098920
#[2,] 1 -0.07263008 -0.1036759
#[3,] 1  0.04372819 -0.1776927
#[4,] 1  0.16008645 -0.1121585

We can do:

VCOV1 <- Xp %*% vcov(lm.tree) %*% t(Xp)
#            [,1]      [,2]       [,3]       [,4]
#[1,]  0.89022938 0.3846809 0.04967582 -0.1147858
#[2,]  0.38468089 0.5369327 0.52828797  0.3587467
#[3,]  0.04967582 0.5282880 0.73113553  0.6582185
#[4,] -0.11478583 0.3587467 0.65821848  0.7836294

But a much more efficient way is via my function:

fast_vcov <- function (lmObject, Xp, diag = TRUE) {
  ## compute Qt
  QR <- lmObject$qr   ## qr object of fitted model
  piv <- QR$pivot    ## pivoting index
  r <- QR$rank    ## model rank / numeric rank
  if (is.unsorted(piv)) {
    ## pivoting has been done
    Qt <- forwardsolve(t(QR$qr), t(Xp[, piv]), r)
    } else {
    ## no pivoting is done
    Qt <- forwardsolve(t(QR$qr), t(Xp), r)
    }
  ## residual variance
  sigma2 <- sum(residuals(lmObject)^2) / df.residual(lmObject)
  ## return
  if (diag) {
    ## return point-wise prediction variance
    ## no need to compute full hat matrix
    return(colSums(Qt ^ 2) * sigma2)
    } else {
    ## return full variance-covariance matrix of predicted values
    return(crossprod(Qt) * sigma2)
    }
  }

With diag = TRUE, only diagonals of the variance-covariance are returned. These are point-wise variance, sufficient to produce point-wise confidence or prediction interval. Note, computing these diagonals do not need explicit formation of full hat matrix.

With diag = FALSE, full hat matrix will be returned by a more expensive matrix multiplication. But it is still faster than the trivial way based on vcov.

Here, we want:

## full variance-covariance matrix
VCOV <- fast_vcov(lm.tree, Xp, diag = FALSE)
#            [,1]      [,2]       [,3]       [,4]
#[1,]  0.89022938 0.3846809 0.04967582 -0.1147858
#[2,]  0.38468089 0.5369327 0.52828797  0.3587467
#[3,]  0.04967582 0.5282880 0.73113553  0.6582185
#[4,] -0.11478583 0.3587467 0.65821848  0.7836294

As you can see, it is exactly as same as VCOV1 obtained from the trivial method.

Obtaining predicted values is easy:

pred <- drop(Xp %*% coef(lm.tree))

In case you feel interested, you can compare those values with what is returned by predict.lm:

pr <- predict(lm.tree, newdat, se.fit = TRUE)

pr$fit
#       1        2        3        4 
#15.31863 22.33400 31.38568 42.47365 

pred
#       1        2        3        4 
#15.31863 22.33400 31.38568 42.47365 

pr$se.fit
#        1         2         3         4 
#0.9435197 0.7327569 0.8550646 0.8852284 

sqrt(diag(VCOV))
# 0.9435197 0.7327569 0.8550646 0.8852284

So, we are definitely doing correct.


Finally, the sum of all predicted values is

sum(pred)
# 111.512

with variance

sum(VCOV)
# 6.671575

Closing Note

Note that in above, prediction matrix Xp is not constructed via with(newdat, poly(Girth, 2)) or model.matrix(~poly(Girth, 2), newdat), as we don't want to construct a new polynomial basis; rather, we want to evaluate the polynomial basis used for model fitting at new covariate values.

It is because of this, it can be quite a pain. predict.lm does not support the return of linear predictor matrix. Some advanced packages, like mgcv, supports type = "lpmatrix" in predict. Why is this helpful? As mentioned above, when we have smooth functions in the model, prediction matrix can not be constructed via model.matrix. This is not only true for poly but also for splines::bs. It is a pain in the ass for user to construct those prediction matrix manually. For example, if we've fitted a model like:

y ~ poly(x1, 2) + bs(x2, 5) + x3 + x4 : f

where x1 to x4 are numeric covariates and f is a factor. When we make prediction, prediction matrix for the "linear" part ~ x3 + x4:f can be constructed safely via model.matrix, while the "non-linear" part: ~ poly(x1, 2) + bs(x2, 5) should be obtained by predict methods for poly and bs. Imagine how difficult it is for us user, to manually generate prediction matrix for different class of smooth terms as well as linear terms, then column bind them together as the final matrix!!

Note my function fast_vcov takes argument Xp. I wish I could feed it with a newdata argument, but I could not find an easy way to automatically generate all prediction matrices correctly inside the function. So, it becomes user's responsibility to obtain the right prediction matrix themselves.

Note, even if we want to use the trivial method: Xp %*% vcov(lmObject) %*% t(Xp), we still need Xp. So either way, we need to struggle a bit.