merlin2011 merlin2011 - 1 month ago 10
R Question

What does the R function `poly` really do?

I have read through the manual page

?poly
(which I admit I did not completely comphrehend) and also read the description of the function in book
Introduction to Statistical Learning
.

My current understanding is that a call to
poly(horsepower, 2)
should be equivalent to writing
horsepower + I(horsepower^2)
. However, this seems to be contradicted by the output of the following code.

library(ISLR)

summary(lm(mpg~poly(horsepower,2), data=Auto))$coef
summary(lm(mpg~horsepower+I(horsepower^2), data=Auto))$coef


Output:

Estimate Std. Error t value Pr(>|t|)
(Intercept) 23.44592 0.2209163 106.13030 2.752212e-289
poly(horsepower, 2)1 -120.13774 4.3739206 -27.46683 4.169400e-93
poly(horsepower, 2)2 44.08953 4.3739206 10.08009 2.196340e-21
Estimate Std. Error t value Pr(>|t|)
(Intercept) 56.900099702 1.8004268063 31.60367 1.740911e-109
horsepower -0.466189630 0.0311246171 -14.97816 2.289429e-40
I(horsepower^2) 0.001230536 0.0001220759 10.08009 2.196340e-21


My question is, why does the output not match, and what is
poly
really doing?

42- 42-
Answer

When introducing polynomial terms in a statistical model the usual motivation is to determine whether the response is "curved" and whether the curvature is "significant" when that term is added in. The upshot of throwing in an +I(x^2) terms is that minor deviations may get "magnified" by the fitting process depending on their location, and misinterpreted as due to the curvature term when they were just fluctuations at one end or other of the range of data. This results in inappropriate assignment of declarations of "significance".

If you just throw in a squared term with I(x^2), of necessity it's also going to be highly correlated with x at least in the domain where x > 0. Using instead: poly(x,2) creates a "curved" set of variables where the linear term is not so highly correlated with x, and where the curvature is roughly the same across the range of data. (If you want to read up on the statistical theory, search on "orthogonal polynomials".) Just type poly(1:10, 2) and look at the two columns.

poly(1:10, 2)
                1           2
 [1,] -0.49543369  0.52223297
 [2,] -0.38533732  0.17407766
 [3,] -0.27524094 -0.08703883
 [4,] -0.16514456 -0.26111648
 [5,] -0.05504819 -0.34815531
 [6,]  0.05504819 -0.34815531
 [7,]  0.16514456 -0.26111648
 [8,]  0.27524094 -0.08703883
 [9,]  0.38533732  0.17407766
[10,]  0.49543369  0.52223297
attr(,"degree")
[1] 1 2
attr(,"coefs")
attr(,"coefs")$alpha
[1] 5.5 5.5

attr(,"coefs")$norm2
[1]   1.0  10.0  82.5 528.0

attr(,"class")
[1] "poly"   "matrix"

The "quadratic" term is centered on 5.5 and the linear term has been shifted down so it is 0 at the same x-point (with the implicit (Intercept) term in the model being depended upon for shifting everything back at the time predictions are requested.)