bogdan.narcis - 8 months ago 32

R Question

I am new to R and I want to improve the following script with an

`*apply`

`apply`

`lm`

`for (i in (1:3) {`

assign(paste0('lm.',names(data[i])), lm(formula=formula(i),data=data))

}

`Formula(i)`

`formula=function(x)`

{

as.formula ( paste(names(data[x]),'~', paste0(names(data[-1:-3]), collapse = '+')), env=parent.frame() )

}

Thank you.

Answer

If I don't get you wrong, you are working with a dataset like this:

```
set.seed(0)
dat <- data.frame(y1 = rnorm(30), y2 = rnorm(30), y3 = rnorm(30),
x1 = rnorm(30), x2 = rnorm(30), x3 = rnorm(30))
```

`x1`

, `x2`

and `x3`

are covariates, while `y1`

, `y2`

and `y3`

are three independent response. You are trying to fit three linear models:

```
y1 ~ x1 + x2 + x3
y2 ~ x1 + x2 + x3
y3 ~ x1 + x2 + x3
```

Currently you are using a loop through `y1`

, `y2`

, `y3`

, fitting one model per time. And you hope to speed the process up by replacing the `for`

loop with, say, `lapply`

.

**I would like to point out, that you are on the wrong track.** `lm()`

is an expensive operation. As long as your dataset is not small, the costs of the `for`

loop is negligible. Replacing `for`

loop with `lapply`

gives no performance gains.

Since you have the same RHS for all three models, model matrix is the same for three models. Therefore, QR factorization for all models need only be done once. `lm`

allows this, and you can use:

```
fit <- lm(cbind(y1, y2, y3) ~ x1 + x2 + x3, data = dat)
#Coefficients:
# y1 y2 y3
#(Intercept) -0.081155 0.042049 0.007261
#x1 -0.037556 0.181407 -0.070109
#x2 -0.334067 0.223742 0.015100
#x3 0.057861 -0.075975 -0.099762
```

If you check `str(fit)`

, you will see that this is not a list of three linear models; instead, it is a single linear model with a single `$qr`

object, but with multiple LHS. So `$coefficients`

, `$residuals`

and `$fitted.values`

are matrices.

If you have a lot more covariates, you can avoid typing or pasting formula by using `.`

:

```
fit <- lm(cbind(y1, y2, y3) ~ ., data = dat)
#Coefficients:
# y1 y2 y3
#(Intercept) -0.081155 0.042049 0.007261
#x1 -0.037556 0.181407 -0.070109
#x2 -0.334067 0.223742 0.015100
#x3 0.057861 -0.075975 -0.099762
```

**Caution:** Do not write:

```
y1 + y2 + y3 ~ x1 + x2 + x3
```

This will treat `y = y1 + y2 + y3`

as a single response. Use `cbind()`

.

**Follow-up:**

Thank you for your response. I am interested in a generalization. Let me be more specific. I have a data frame

`df`

, where first`n`

columns are independent variables`(y1,y2,y3,....)`

and next`m`

columns are the dependent variables`(x1+x2+x3+....)`

. For`n=3`

and`m=3`

it is the case with`fit <- lm(cbind(y1, y2, y3) ~ ., data = dat))`

. But how can this be done automatically, by using the structure of the`df`

. I mean something like`(for i in (1:n)) fit <- lm(cbind(df[something] ~ df[something], data = dat))`

. That "something" I have created it with`paste`

and`paste0`

. Thank you.

You can still create the model formula by `paste`

and `paste0`

:

```
paste0("cbind(", paste(names(df)[1:n], collapse = ", "), ")", " ~ .")
```

For example, using `iris`

dataset:

```
paste0("cbind(", paste(names(iris)[1:2], collapse = ", "), ")", " ~ .")
# "cbind(Sepal.Length, Sepal.Width) ~ ."
```

You can pass this string formula to `lm`

, as `lm`

will automatically coerce it into formula class.