Stick - 3 years ago 144

R Question

First off, I am pretty new to this so my method/thinking may be wrong, I have imported a xlsx data set into a data frame using R and R studio. I want to be able to loop through the column names to get all of the variables with exactly "*10*" in them in order to run a simple linear regression. So here's my code:

`indx <- grepl('_10_', colnames(data)) #list returns all of the true values in the data set`

col10 <- names(data[indx]) #this gives me the names of the columns I want

Here is the for loop I have which returns an error:

`temp <- c()`

for(i in 1:length(col10)){

temp = col10[[i]]

lm.test <- lm(Total_Transactions ~ temp[[i]], data = data)

print(temp) #actually prints out the right column names

i + 1

}

Is it even possible to run a loop to place those variables in the linear regression model? The error I am getting is: "Error in model.frame.default(formula = Total_Transactions ~ temp[[i]], : variable lengths differ (found for 'temp[[i]]')". If anyone could point me in the right direction I would be very grateful. Thanks.

Recommended for you: Get network issues from **WhatsUp Gold**. **Not end users.**

Answer Source

Ok, I'll post an answer. I will use the dataset `mtcars`

as an example. I believe it will work with your dataset.

First, I create a store, `lm.test`

, an object of class `list`

. In your code you are assigning the output of `lm(.)`

every time through the loop and in the end you would only have the last one, all others would have been rewriten by the newer ones.

Then, inside the loop, I use function `reformulate`

to put together the regression formula. There are other ways of doing this but this one is simple.

```
# Use just some columns
data <- mtcars[, c("mpg", "cyl", "disp", "hp", "drat", "wt")]
col10 <- names(data)[-1]
lm.test <- vector("list", length(col10))
for(i in seq_along(col10)){
lm.test[[i]] <- lm(reformulate(col10[i], "mpg"), data = data)
}
lm.test
```

Now you can use the results list for all sorts of things. I suggest you start using `lapply`

and friends for that.

For instance, to extract the coefficients:

```
cfs <- lapply(lm.test, coef)
```

In order to get the summaries:

```
smry <- lapply(lm.test, summary)
```

It becomes very simple once you're familiar with `*apply`

functions.

Recommended from our users: **Dynamic Network Monitoring from WhatsUp Gold from IPSwitch**. ** Free Download**