Is it a good practice to transform a model in the formula definition for a linear model?
reg1 <- lm(log(Y) ~ X + Z + (W)^2, data = data)
Since a quick search did not reveal a duplicate (that would have an answer), here is one:
The part of
lm where you specify your regression equation is called formula. Formulas use operators (like
+) in their own way, so you cannot use them to do arithmetic within the formula.
In order to do arithmetic within formula, you need to use the
I function, as @jogo suggests (see
?I for its description in R) like so:
reg1 <- lm(log(Y) ~ X + Z + I(W^2), data = data)
This prevents R from interpreting the operators as formula operators, so they are interpreted as arithmetic operators instead.
Another example where R interprets your input in a perhaps non-intuitive way is the function
data.frame. If you try to construct a data.frame out of a list, it coerces it into atomic vector(s):
li = list(x = 1:3, y = 11:13, z = 21:23) data.frame(a = 5:7, b = li) # a b.x b.y b.z # 1 5 1 11 21 # 2 6 2 12 22 # 3 7 3 13 23
This too can be avoided using
I, which will inhibit this special interpretation of list as one or more atomic vectors, and create a list column instead:
data.frame(a = 5:7, b = I(li)) # a b # x 5 1, 2, 3 # y 6 11, 12, 13 # z 7 21, 22, 23