dleal dleal - 21 days ago 5x
R Question

transforming variables for a linear model in R

Is it a good practice to transform a model in the formula definition for a linear model?

For example:

reg1 <- lm(log(Y) ~ X + Z + (W)^2, data = data)

when I only have W, X, Y, Z in data and not the transformed variables? I don't see W^2 listed when I call summary of reg1.

Thank you,


Since a quick search did not reveal a duplicate (that would have an answer), here is one:

The part of lm where you specify your regression equation is called formula. Formulas use operators (like ^, +) in their own way, so you cannot use them to do arithmetic within the formula.

In order to do arithmetic within formula, you need to use the I function, as @jogo suggests (see ?I for its description in R) like so:

reg1 <- lm(log(Y) ~ X + Z + I(W^2), data = data)

This prevents R from interpreting the operators as formula operators, so they are interpreted as arithmetic operators instead.

Another example where R interprets your input in a perhaps non-intuitive way is the function data.frame. If you try to construct a data.frame out of a list, it coerces it into atomic vector(s):

li = list(x = 1:3, y = 11:13, z = 21:23)
data.frame(a = 5:7, b = li)
#   a b.x b.y b.z
# 1 5   1  11  21
# 2 6   2  12  22
# 3 7   3  13  23

This too can be avoided using I, which will inhibit this special interpretation of list as one or more atomic vectors, and create a list column instead:

data.frame(a = 5:7, b = I(li))
#   a          b
# x 5    1, 2, 3
# y 6 11, 12, 13
# z 7 21, 22, 23