Eric Shierman Eric Shierman - 1 month ago 6
R Question

Can the R "for" loop be used in an object name and an excel document name at the same time?

This is my first attempt at a loop in R and I don't know if it's even possible to do what I'm trying to do, but if so, I think this would be useful to many people who might find your answer to my question in a Google search.

I'm trying to append 89 Excel spreadsheets together. These spreadsheets have dozens of sheets, and I need to select the one I want as well as delete the first three rows. I know how to do all of this one document at a time, but with 89 documents, boy wouldn't it would be nice to automate this.

One thing that helps is that each document name ends in a date. In my example each document is one day's electricity price data. Because of the date being in the document's name I was hoping to use the first_date:last_date construction.

Here's the example of the coding for a single document upload:

MDFD_20170207 <- read_excel("O:/Project/P~Port of Seattle/Prices/Mid-C/20170615 Platt's/MDFD_20170207.xlsx",
sheet = "Bilateral Indexes", col_names = FALSE,
skip = 3)

And here was my attempt to apply basic R "for" looping material that I've read to this case:

for (i in 20170207:20170210){
print(paste(,i<- read_excel("O:/Project/P~Port of Seattle/Prices/Mid-C/20170615 Platt's/MDFD_,i.xlsx",
sheet = "Bilateral Indexes", col_names = FALSE,
skip = 3)

It didn't work, I got the following error message:

Error in paste(, i <- read_excel("O:/Project/P~Port of Seattle/Prices/Mid-C/20170615 Platt's/MDFD_,i.xlsx", :
argument is missing, with no default

I'm not sure what that means. For example, what argument is it saying is missing?

I hope I've written this with enough explanation so that if a workable answer is given, other people will be able to save time at work not having to upload one document at a time when appending big data sets.

Update: Here's another approach I've been working on:

df <- data.frame()
full_path <- "O:/Project/P~Port of Seattle/Prices/Mid-C/20170615 Platt's/"
docs <- c(20170207:20170209)
for (f in docs){
filename <- paste0(full_path, f,".xlsx")
tmp_df <- read_excel(filename, sheetName = "Bilateral Indexes", col_names = FALSE, skip = 3)
df <- rbind(df,tmp_df)

This all appears to work, in terms of structure, but then it's not accepting my read_excel command:

Error in sheets_fun(path) :
Evaluation error: zip file 'O:/Project/P~Port of Seattle/Prices/Mid-C/20170615 Platt's/20170207.xlsx' cannot be opened.

Answer Source

You could create the file name string first and then use that variable in your read_excel call. I'm assuming you want to append all files' data into one table and that each file has the same structure (namely column names)...

#create data frame to store rows for all Excel files
all.rows <- data.frame()

#loop through files and append data to said data frame     
for (i in 20170207:20170210){
    filename = paste("O:/Project/P~Port of Seattle/Prices/Mid-C/20170615 Platt's/MDFD_",i,".xlsx",sep='')
    tmp_table <- read_excel(filename,sheet = "Bilateral Indexes", col_names = FALSE, skip = 3))
    #subset tmp_table to desired columns
    all.rows <- rbind(all.rows, tmp_table[,c('peak prices','off-peak prices')])

#now you can perform calculations on the data frame [replace <column> with your column name]
mean.var <- mean(all.rows$<column>)