dleal - 1 year ago 70
R Question

# transforming variables for a linear model in R

Is it a good practice to transform a model in the formula definition for a linear model?

For example:

``````reg1 <- lm(log(Y) ~ X + Z + (W)^2, data = data)
``````

when I only have W, X, Y, Z in data and not the transformed variables? I don't see W^2 listed when I call summary of reg1.

Thank you,

Since a quick search did not reveal a duplicate (that would have an answer), here is one:

The part of `lm` where you specify your regression equation is called formula. Formulas use operators (like `^`, `+`) in their own way, so you cannot use them to do arithmetic within the formula.

In order to do arithmetic within formula, you need to use the `I` function, as @jogo suggests (see `?I` for its description in R) like so:

``````reg1 <- lm(log(Y) ~ X + Z + I(W^2), data = data)
``````

This prevents R from interpreting the operators as formula operators, so they are interpreted as arithmetic operators instead.

Another example where R interprets your input in a perhaps non-intuitive way is the function `data.frame`. If you try to construct a data.frame out of a list, it coerces it into atomic vector(s):

``````li = list(x = 1:3, y = 11:13, z = 21:23)
data.frame(a = 5:7, b = li)
#   a b.x b.y b.z
# 1 5   1  11  21
# 2 6   2  12  22
# 3 7   3  13  23
``````

This too can be avoided using `I`, which will inhibit this special interpretation of list as one or more atomic vectors, and create a list column instead:

``````data.frame(a = 5:7, b = I(li))
#   a          b
# x 5    1, 2, 3
# y 6 11, 12, 13
# z 7 21, 22, 23
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download