Eric Green - 1 year ago 68

R Question

In the minimal example below, I am trying to use the values of a character string

`vars`

I know there are better ways to run the regression (e.g.,

`lm(v1 ~ v2 + v3 + v4, data=dat)`

`# minimal example`

# create data frame

v1 <- rnorm(10)

v2 <- sample(c(0,1), 10, replace=TRUE)

v3 <- rnorm(10)

v4 <- rnorm(10)

dat <- cbind(v1, v2, v3, v4)

dat <- as.data.frame(dat)

# create objects of column names

c.2 <- colnames(dat)[2]

c.3 <- colnames(dat)[3]

c.4 <- colnames(dat)[4]

# shortcut to get to the type of object my full code produces

vars <- paste(c.2, c.3, c.4, sep="+")

### TRYING TO SOLVE FROM THIS POINT:

print(vars)

# [1] "v2+v3+v4"

# use vars in regression

regression <- paste0("v1", " ~ ", vars)

m1 <- lm(as.formula(regression), data=dat)

Update:

@Arun was correct about the missing "" on

`v1`

`vars`

Here's an example that does not work :) Uses the same data frame

`dat`

`dv <- colnames(dat)[1]`

r2 <- colnames(dat)[2]

# the following loop creates objects r3, r4, r5, and r6

# r5 and r6 are interaction terms

for (v in 3:4) {

r <- colnames(dat)[v]

assign(paste("r",v,sep=""),r)

r <- paste(colnames(dat)[2], colnames(dat)[v], sep="*")

assign(paste("r",v+2,sep=""),r)

}

# combine r3, r4, r5, and r6 then collapse and remove trailing +

vars2 <- sapply(3:6, function(i) {

paste0("r", i, "+")

})

vars2 <- paste(vars2, collapse = '')

vars2 <- substr(vars2, 1, nchar(vars2)-1)

# concatenate dv, r2 (as a factor), and vars into `eq`

eq <- paste0(dv, " ~ factor(",r2,") +", vars2)

Here is the issue:

`print(eq)`

# [1] "v1 ~ factor(v2) +r3+r4+r5+r6"

Unlike

`regression`

`eq`

`v3`

`r3`

`lm()`

`m2 <- lm(as.formula(eq), data=dat)`

Answer Source

I see a couple issues going on here. First, and I don't think this is causing any trouble, but let's make your data frame in one step so you don't have `v1`

through `v4`

floating around both in the global environment as well as in the data frame. Second, let's just make `v2`

a factor here so that we won't have to deal with making it a factor later.

```
dat <- data.frame(v1 = rnorm(10),
v2 = factor(sample(c(0,1), 10, replace=TRUE)),
v3 = rnorm(10),
v4 = rnorm(10) )
```

**Part One** Now, for your first part, it looks like this is what you want:

```
lm(v1 ~ v2 + v3 + v4, data=dat)
```

Here's a simpler way to do that, though you still have to specify the response variable.

```
lm(v1 ~ ., data=dat)
```

Alternatively, you certainly can build up the function with paste and call `lm`

on it.

```
f <- paste(names(dat)[1], "~", paste(names(dat)[-1], collapse=" + "))
# "v1 ~ v2 + v3 + v4"
lm(f, data=dat)
```

However, my preference in these situations is to use `do.call`

, which evaluates expressions before passing them to the function; this makes the resulting object more suitable for calling functions like `update`

on. Compare the `call`

part of the output.

```
do.call("lm", list(as.formula(f), data=as.name("dat")))
```

**Part Two** About your second part, it looks like this is what you're going for:

```
lm(factor(v2) + v3 + v4 + v2*v3 + v2*v4, data=dat)
```

First, because `v2`

is a factor in the data frame, we don't need that part, and secondly, this can be simplified further by better using R's methods for using arithmetical operations to create interactions, like this.

```
lm(v1 ~ v2*(v3 + v4), data=dat)
```

I'd then simply create the function using `paste`

; the loop with `assign`

, even in the larger case, is probably not a good idea.

```
f <- paste(names(dat)[1], "~", names(dat)[2], "* (",
paste(names(dat)[-c(1:2)], collapse=" + "), ")")
# "v1 ~ v2 * ( v3 + v4 )"
```

It can then be called using either `lm`

directly or with `do.call`

.

```
lm(f, data=dat)
do.call("lm", list(as.formula(f), data=as.name("dat")))
```

**About your code** The problem you had with trying to use `r3`

etc was that you wanted the contents of the variable `r3`

, not the value `r3`

. To get the value, you need `get`

, like this, and then you'd collapse the values together with `paste`

.

```
vars <- sapply(paste0("r", 3:6), get)
paste(vars, collapse=" + ")
```

However, a better way would be to avoid `assign`

and just build a vector of the terms you want, like this.

```
vars <- NULL
for (v in 3:4) {
vars <- c(vars, colnames(dat)[v], paste(colnames(dat)[2],
colnames(dat)[v], sep="*"))
}
paste(vars, collapse=" + ")
```

A more R-like solution would be to use `lapply`

:

```
vars <- unlist(lapply(colnames(dat)[3:4],
function(x) c(x, paste(colnames(dat)[2], x, sep="*"))))
```