Erdogan CEVHER - 1 year ago 59
R Question

# What is the explanation of the difference caused by the order of variables in anova for the concrete example in R?

I wonder the real reason behind the difference in results of the two

`anova`
s that use the same covariates. However, the results are different.

``````library(PASWR2)
#  age    ht    wt abs triceps subscap hwfat tanfat skfat
# 1  18 65.75 133.6   8       6    10.5 10.71   11.9  9.80
# 2  15 65.50 129.0  10       8     9.0  8.53   10.0 10.56
# ...
# 77  15 68 153.8  13       7      11 10.07   16.7 11.77
# 78  15 66 258.6  45      37      43 33.75   34.5 38.93

mod1.HSW <- lm(hwfat ~ abs + triceps + subscap, data = HSWRESTLER)
anova(mod1.HSW)
# Analysis of Variance Table
# Response: hwfat
#            Df Sum Sq Mean Sq F value    Pr(>F)
# abs        1 5072.8  5072.8 535.858 < 2.2e-16 ***
# triceps    1  242.2   242.2  25.581 2.984e-06 ***
# subscap    1    2.2     2.2   0.237    0.6278
# Residuals 74  700.5     9.5
# ---
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

mod2.HSW <- lm(hwfat ~ subscap + triceps + abs, data = HSWRESTLER)
anova(mod2.HSW) # ANOVA
# Analysis of Variance Table
# Response: hwfat
#           Df Sum Sq Mean Sq F value    Pr(>F)
# subscap    1 4939.0  4939.0 521.720 < 2.2e-16 ***
# triceps    1  204.6   204.6  21.616 1.422e-05 ***
# abs        1  173.6   173.6  18.341 5.473e-05 ***
# Residuals 74  700.5     9.5
# ---
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
``````

`?anova`
help file is not explanatory enough about what is going on. The similar SOF questions does not handle the situation with a concrete example like this. The variables in the regressions seem to be continuous variables (covariates). It seems the variable order is important. But, what does that order mean? What can be inferred from the above two
`anova`
s? Any idea?

When you call `anova` on an `lm` model fit, under the hood you are really using `?anova.lm`, which according to the documentation "gives a sequential analysis of variance table for that fit." This is a `type I` ANOVA, where order of the variables matter.The term `abs` in your second example only represents the unique portion of the regression explained given the previous two variables.

You can perform `type II` ANOVA using `drop1()`. Here order doesn't matter, thus each main effect can be understood as the individual contribution of that predictor:

``````> drop1(mod1.HSW)
#Single term deletions
#
#Model:
#hwfat ~ abs + triceps + subscap
#        Df Sum of Sq    RSS    AIC
#<none>               700.54 179.22
#abs      1   173.629 874.17 194.49
#triceps  1   111.837 812.38 188.77
#subscap  1     2.244 702.78 177.47

> drop1(mod2.HSW)
#Single term deletions
#
#Model:
#hwfat ~ subscap + triceps + abs
#        Df Sum of Sq    RSS    AIC
#<none>               700.54 179.22
#subscap  1     2.244 702.78 177.47
#triceps  1   111.837 812.38 188.77
#abs      1   173.629 874.17 194.49
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download