First off, I am pretty new to this so my method/thinking may be wrong, I have imported a xlsx data set into a data frame using R and R studio. I want to be able to loop through the column names to get all of the variables with exactly "10" in them in order to run a simple linear regression. So here's my code:
indx <- grepl('_10_', colnames(data)) #list returns all of the true values in the data set
col10 <- names(data[indx]) #this gives me the names of the columns I want
temp <- c()
for(i in 1:length(col10)){
temp = col10[[i]]
lm.test <- lm(Total_Transactions ~ temp[[i]], data = data)
print(temp) #actually prints out the right column names
i + 1
}
Ok, I'll post an answer. I will use the dataset mtcars
as an example. I believe it will work with your dataset.
First, I create a store, lm.test
, an object of class list
. In your code you are assigning the output of lm(.)
every time through the loop and in the end you would only have the last one, all others would have been rewriten by the newer ones.
Then, inside the loop, I use function reformulate
to put together the regression formula. There are other ways of doing this but this one is simple.
# Use just some columns
data <- mtcars[, c("mpg", "cyl", "disp", "hp", "drat", "wt")]
col10 <- names(data)[-1]
lm.test <- vector("list", length(col10))
for(i in seq_along(col10)){
lm.test[[i]] <- lm(reformulate(col10[i], "mpg"), data = data)
}
lm.test
Now you can use the results list for all sorts of things. I suggest you start using lapply
and friends for that.
For instance, to extract the coefficients:
cfs <- lapply(lm.test, coef)
In order to get the summaries:
smry <- lapply(lm.test, summary)
It becomes very simple once you're familiar with *apply
functions.