Leonhardt Guass Leonhardt Guass - 1 month ago 17
R Question

How to write interactions in regressions in R?

DF <- data.frame(factor1=rep(1:4,1000), factor2 = rep(1:4,each=1000),base = rnorm(4000,0,1),dep=rnorm(4000,400,5))

DF$f1_1 = DF$factor1 == 1
DF$f1_2 = DF$factor1 == 2
DF$f1_3 = DF$factor1 == 3
DF$f1_4 = DF$factor1 == 4

DF$f2_1 = DF$factor2 == 1
DF$f2_2 = DF$factor2 == 2
DF$f2_3 = DF$factor2 == 3
DF$f2_4 = DF$factor2 == 4


I want to run the following regression:

Dep = (f1_1 + f1_2 + f1_3 + f1_4)*(f2_1 + f2_2 + f2_3 + f2_4)*(base+base^2+base^3+base^4+base^5)


I understand how to write it in the "lame" way. Is there a smarter way to do it?

Answer

You should code factor1 and factor2 as real factor variables. Also, it is better to use poly for polynomials. Here is what we can do:

DF <- data.frame(factor1=rep(1:4,1000), factor2 = rep(1:4,each=1000),
                 base = rnorm(4000,0,1), dep = rnorm(4000,400,5))

DF$factor1 <- as.factor(DF$factor1)
DF$factor2 <- as.factor(DF$factor2)

fit <- lm(dep ~ factor1 * factor2 * poly(base, degree = 5))

By default, poly generates orthogonal basis for numerical stability. If you want ordinary polynomials like base + base ^ 2 + base ^ 3 + ..., use poly(base, degree = 5, raw = TRUE).

Be aware, you will get lots of parameters from this model, as you are fitting a fifth order polynomial for each pair of levels between factor1 and factor2.


Consider a small example.

set.seed(0)
f1 <- sample(gl(3, 20, labels = letters[1:3]))    ## randomized balanced factor
f2 <- sample(gl(3, 20, labels = LETTERS[1:3]))    ## randomized balanced factor
x <- runif(3 * 20)  ## numerical covariate
y <- rnorm(3 * 20)  ## toy response

fit <- lm(y ~ f1 * f2 * poly(x, 2))

#Call:
#lm(formula = y ~ f1 * f2 * poly(x, 2))
#
#Coefficients:
#        (Intercept)                  f1b                  f1c  
#            -0.5387               0.8776               0.1572  
#                f2B                  f2C          poly(x, 2)1  
#             0.5113               1.0139               5.8345  
#        poly(x, 2)2              f1b:f2B              f1c:f2B  
#             2.4373               1.0666               0.1372  
#            f1b:f2C              f1c:f2C      f1b:poly(x, 2)1  
#            -1.4951              -1.4601              -6.2338  
#    f1c:poly(x, 2)1      f1b:poly(x, 2)2      f1c:poly(x, 2)2  
#           -11.0760              -2.3668               1.9708  
#    f2B:poly(x, 2)1      f2C:poly(x, 2)1      f2B:poly(x, 2)2  
#            -3.7127              -5.8253               5.6227  
#    f2C:poly(x, 2)2  f1b:f2B:poly(x, 2)1  f1c:f2B:poly(x, 2)1  
#            -7.3582              20.9179              11.6270  
#f1b:f2C:poly(x, 2)1  f1c:f2C:poly(x, 2)1  f1b:f2B:poly(x, 2)2  
#             1.2897              11.2041              12.8096  
#f1c:f2B:poly(x, 2)2  f1b:f2C:poly(x, 2)2  f1c:f2C:poly(x, 2)2  
#            -9.8476              10.6664               4.5582  

Note, even for 3 factor levels each and a 3rd order polynomial, we already end up with great number of coefficients.