DeltaIV DeltaIV - 3 months ago 34
R Question

Does geom_smooth() of ggplot2 show pointwise confidence bands, or simultaneous confidence bands?

I was unsure whether this question would be more appropriate here or on Cross Validated. I hope I made the right choice.

Consider the example:

library(dplyr)
setosa <- iris %>% filter(Species == "setosa") %>% select(Sepal.Length, Sepal.Width, Species)
library(ggplot2)
ggplot(data = setosa, aes(x = Sepal.Length, y = Sepal.Width)) +
geom_point() +
geom_smooth(method ="lm", formula = y ~ poly(x,2))


enter image description here
By default,
ggplot
"displays confidence interval around smooth" (see here), given by the gray area around the regression curve. I've always assumed these are simultaneous confidence bands for the regression curve, not pointwise confidence bands.
ggplot2
documentation refers to the
predict
function for details on how the standard errors are computed. However, reading the doc for predict.lm, it doesn't say explicitly that simultaneous confidence bands are computed. So, what is the correct interpretation here?

Answer

One way to check what predict.lm() computes is to inspect the code (predict multiplies standard errors by qt((1 - level)/2, df), and so does not appear to make adjustments for simultaneous inference). Another way is to construct simultaneous confidence intervals and compare them against predict's intervals.

Fit the model and construct simultaneous confidence intervals:

setosa <- subset(iris, Species == "setosa")
setosa <- setosa[order(setosa$Sepal.Length), ]
fit <- lm(Sepal.Width ~ poly(Sepal.Length, 2), setosa)

K <- cbind(1, poly(setosa$Sepal.Length, 2))
cht <- multcomp::glht(fit, linfct = K)
cci <- confint(cht)

Reshape and plot:

cc <- as.data.frame(cci$confint)
cc$Sepal.Length <- setosa$Sepal.Length
cc <- reshape2::melt(cc[, 2:4], id.var = "Sepal.Length")

library(ggplot2)
ggplot(data = setosa, aes(x = Sepal.Length, y = Sepal.Width)) +
  geom_point() +
  geom_smooth(method ="lm", formula = y ~ poly(x,2)) +
  geom_line(data = cc, 
            aes(x = Sepal.Length, y = value, group = variable),
            colour = "red")

It appears that predict(.., interval = "confidence") does not produce simultaneous confidence intervals:

enter image description here