bogdan.narcis - 1 month ago 4x
R Question

# Fitting a linear model with multiple LHS

I am new to R and I want to improve the following script with an

*apply
apply
, but I couldn't manage to use it). I want to use
lm
function on multiple independent variables (which are columns in a data frame). I used

for (i in (1:3) {
assign(paste0('lm.',names(data[i])), lm(formula=formula(i),data=data))
}

Formula(i)
is defined as

formula=function(x)
{
as.formula ( paste(names(data[x]),'~', paste0(names(data[-1:-3]), collapse = '+')), env=parent.frame() )
}

Thank you.

If I don't get you wrong, you are working with a dataset like this:

set.seed(0)
dat <- data.frame(y1 = rnorm(30), y2 = rnorm(30), y3 = rnorm(30),
x1 = rnorm(30), x2 = rnorm(30), x3 = rnorm(30))

x1, x2 and x3 are covariates, while y1, y2 and y3 are three independent response. You are trying to fit three linear models:

y1 ~ x1 + x2 + x3
y2 ~ x1 + x2 + x3
y3 ~ x1 + x2 + x3

Currently you are using a loop through y1, y2, y3, fitting one model per time. And you hope to speed the process up by replacing the for loop with, say, lapply.

I would like to point out, that you are on the wrong track. lm() is an expensive operation. As long as your dataset is not small, the costs of the for loop is negligible. Replacing for loop with lapply gives no performance gains.

Since you have the same RHS for all three models, model matrix is the same for three models. Therefore, QR factorization for all models need only be done once. lm allows this, and you can use:

fit <- lm(cbind(y1, y2, y3) ~ x1 + x2 + x3, data = dat)
#Coefficients:
#             y1         y2         y3
#(Intercept)  -0.081155   0.042049   0.007261
#x1           -0.037556   0.181407  -0.070109
#x2           -0.334067   0.223742   0.015100
#x3            0.057861  -0.075975  -0.099762

If you check str(fit), you will see that this is not a list of three linear models; instead, it is a single linear model with a single \$qr object, but with multiple LHS. So \$coefficients, \$residuals and \$fitted.values are matrices.

If you have a lot more covariates, you can avoid typing or pasting formula by using .:

fit <- lm(cbind(y1, y2, y3) ~ ., data = dat)
#Coefficients:
#             y1         y2         y3
#(Intercept)  -0.081155   0.042049   0.007261
#x1           -0.037556   0.181407  -0.070109
#x2           -0.334067   0.223742   0.015100
#x3            0.057861  -0.075975  -0.099762

Caution: Do not write:

y1 + y2 + y3 ~ x1 + x2 + x3

This will treat y = y1 + y2 + y3 as a single response. Use cbind().

Follow-up:

Thank you for your response. I am interested in a generalization. Let me be more specific. I have a data frame df, where first n columns are independent variables (y1,y2,y3,....) and next m columns are the dependent variables (x1+x2+x3+....). For n=3 and m=3 it is the case with fit <- lm(cbind(y1, y2, y3) ~ ., data = dat)). But how can this be done automatically, by using the structure of the df. I mean something like (for i in (1:n)) fit <- lm(cbind(df[something] ~ df[something], data = dat)). That "something" I have created it with paste and paste0. Thank you.

You can still create the model formula by paste and paste0:

paste0("cbind(", paste(names(df)[1:n], collapse = ", "), ")", " ~ .")

For example, using iris dataset:

paste0("cbind(", paste(names(iris)[1:2], collapse = ", "), ")", " ~ .")
# "cbind(Sepal.Length, Sepal.Width) ~ ."

You can pass this string formula to lm, as lm will automatically coerce it into formula class.

Source (Stackoverflow)