dleal - 2 months ago 8

R Question

Is it a good practice to transform a model in the formula definition for a linear model?

For example:

`reg1 <- lm(log(Y) ~ X + Z + (W)^2, data = data)`

when I only have W, X, Y, Z in data and not the transformed variables? I don't see W^2 listed when I call summary of reg1.

Thank you,

Answer

Since a quick search did not reveal a duplicate (that would have an answer), here is one:

The part of `lm`

where you specify your regression equation is called **formula**. Formulas use operators (like `^`

, `+`

) in their own way, so you cannot use them to do arithmetic within the formula.

In order to do arithmetic within formula, you need to use the `I`

function, as @jogo suggests (see `?I`

for its description in R) like so:

```
reg1 <- lm(log(Y) ~ X + Z + I(W^2), data = data)
```

This prevents R from interpreting the operators as *formula operators*, so they are interpreted as *arithmetic operators* instead.

Another example where R interprets your input in a perhaps non-intuitive way is the function `data.frame`

. If you try to construct a data.frame out of a list, it coerces it into atomic vector(s):

```
li = list(x = 1:3, y = 11:13, z = 21:23)
data.frame(a = 5:7, b = li)
# a b.x b.y b.z
# 1 5 1 11 21
# 2 6 2 12 22
# 3 7 3 13 23
```

This too can be avoided using `I`

, which will **inhibit** this special interpretation of list as one or more atomic vectors, and create a list column instead:

```
data.frame(a = 5:7, b = I(li))
# a b
# x 5 1, 2, 3
# y 6 11, 12, 13
# z 7 21, 22, 23
```