merlin2011 - 8 months ago 34
R Question

# What does the R function `poly` really do?

I have read through the manual page

`?poly`
(which I admit I did not completely comphrehend) and also read the description of the function in book
`Introduction to Statistical Learning`
.

My current understanding is that a call to
`poly(horsepower, 2)`
should be equivalent to writing
`horsepower + I(horsepower^2)`
. However, this seems to be contradicted by the output of the following code.

``````library(ISLR)

summary(lm(mpg~poly(horsepower,2), data=Auto))\$coef
summary(lm(mpg~horsepower+I(horsepower^2), data=Auto))\$coef
``````

Output:

``````                               Estimate Std. Error   t value      Pr(>|t|)
(Intercept)            23.44592  0.2209163 106.13030 2.752212e-289
poly(horsepower, 2)1 -120.13774  4.3739206 -27.46683  4.169400e-93
poly(horsepower, 2)2   44.08953  4.3739206  10.08009  2.196340e-21
Estimate   Std. Error   t value      Pr(>|t|)
(Intercept)     56.900099702 1.8004268063  31.60367 1.740911e-109
horsepower      -0.466189630 0.0311246171 -14.97816  2.289429e-40
I(horsepower^2)  0.001230536 0.0001220759  10.08009  2.196340e-21
``````

My question is, why does the output not match, and what is
`poly`
really doing?

When introducing polynomial terms in a statistical model the usual motivation is to determine whether the response is "curved" and whether the curvature is "significant" when that term is added in. The upshot of throwing in an `+I(x^2)` terms is that minor deviations may get "magnified" by the fitting process depending on their location, and misinterpreted as due to the curvature term when they were just fluctuations at one end or other of the range of data. This results in inappropriate assignment of declarations of "significance".

If you just throw in a squared term with `I(x^2)`, of necessity it's also going to be highly correlated with x at least in the domain where `x > 0`. Using instead: `poly(x,2)` creates a "curved" set of variables where the linear term is not so highly correlated with x, and where the curvature is roughly the same across the range of data. (If you want to read up on the statistical theory, search on "orthogonal polynomials".) Just type `poly(1:10, 2)` and look at the two columns.

``````poly(1:10, 2)
1           2
[1,] -0.49543369  0.52223297
[2,] -0.38533732  0.17407766
[3,] -0.27524094 -0.08703883
[4,] -0.16514456 -0.26111648
[5,] -0.05504819 -0.34815531
[6,]  0.05504819 -0.34815531
[7,]  0.16514456 -0.26111648
[8,]  0.27524094 -0.08703883
[9,]  0.38533732  0.17407766
[10,]  0.49543369  0.52223297
attr(,"degree")
[1] 1 2
attr(,"coefs")
attr(,"coefs")\$alpha
[1] 5.5 5.5

attr(,"coefs")\$norm2
[1]   1.0  10.0  82.5 528.0

attr(,"class")
[1] "poly"   "matrix"
``````

The "quadratic" term is centered on 5.5 and the linear term has been shifted down so it is 0 at the same x-point (with the implicit `(Intercept)` term in the model being depended upon for shifting everything back at the time predictions are requested.)

Source (Stackoverflow)