merlin2011 - 1 year ago 56

R Question

I have read through the manual page

`?poly`

`Introduction to Statistical Learning`

My current understanding is that a call to

`poly(horsepower, 2)`

`horsepower + I(horsepower^2)`

`library(ISLR)`

summary(lm(mpg~poly(horsepower,2), data=Auto))$coef

summary(lm(mpg~horsepower+I(horsepower^2), data=Auto))$coef

Output:

`Estimate Std. Error t value Pr(>|t|)`

(Intercept) 23.44592 0.2209163 106.13030 2.752212e-289

poly(horsepower, 2)1 -120.13774 4.3739206 -27.46683 4.169400e-93

poly(horsepower, 2)2 44.08953 4.3739206 10.08009 2.196340e-21

Estimate Std. Error t value Pr(>|t|)

(Intercept) 56.900099702 1.8004268063 31.60367 1.740911e-109

horsepower -0.466189630 0.0311246171 -14.97816 2.289429e-40

I(horsepower^2) 0.001230536 0.0001220759 10.08009 2.196340e-21

My question is, why does the output not match, and what is

`poly`

Answer Source

When introducing polynomial terms in a statistical model the usual motivation is to determine whether the response is "curved" and whether the curvature is "significant" when that term is added in. The upshot of throwing in an `+I(x^2)`

terms is that minor deviations may get "magnified" by the fitting process depending on their location, and misinterpreted as due to the curvature term when they were just fluctuations at one end or other of the range of data. This results in inappropriate assignment of declarations of "significance".

If you just throw in a squared term with `I(x^2)`

, of necessity it's also going to be highly correlated with x at least in the domain where `x > 0`

. Using instead: `poly(x,2)`

creates a "curved" set of variables where the linear term is not so highly correlated with x, and where the curvature is roughly the same across the range of data. (If you want to read up on the statistical theory, search on "orthogonal polynomials".) Just type `poly(1:10, 2)`

and look at the two columns.

```
poly(1:10, 2)
1 2
[1,] -0.49543369 0.52223297
[2,] -0.38533732 0.17407766
[3,] -0.27524094 -0.08703883
[4,] -0.16514456 -0.26111648
[5,] -0.05504819 -0.34815531
[6,] 0.05504819 -0.34815531
[7,] 0.16514456 -0.26111648
[8,] 0.27524094 -0.08703883
[9,] 0.38533732 0.17407766
[10,] 0.49543369 0.52223297
attr(,"degree")
[1] 1 2
attr(,"coefs")
attr(,"coefs")$alpha
[1] 5.5 5.5
attr(,"coefs")$norm2
[1] 1.0 10.0 82.5 528.0
attr(,"class")
[1] "poly" "matrix"
```

The "quadratic" term is centered on 5.5 and the linear term has been shifted down so it is 0 at the same x-point (with the implicit `(Intercept)`

term in the model being depended upon for shifting everything back at the time predictions are requested.)