Alex Alex - 1 year ago 85
R Question

R read excel by column names

So I have a bunch of excel files I want to loop through and read specific, discontinuous columns into a data frame. Using the

readxl
works for the basic stuff like this:

library(readxl)
library(plyr)
wb <- list.files(pattern = "*.xls")
dflist <- list()

for (i in wb){
dflist[[i]] <- data.frame(read_excel(i, sheet = "SheetName", skip=3, col_names = TRUE))
}

# now put them into a data frame
data <- ldply(dflist, data.frame, .id = NULL)


This works (barely) but the problem is my excel files have about 114 columns and I only want specific ones. Also I do not want to allow
R
to guess the
col_types
because it messes some of them up (eg for a string column, if the first value starts with a number, it tries to interpret the whole column as numeric, and crashes). So my question is: How do I specify specific, discontinuous columns to read? The
range
argument uses the
cell_ranger
package which does not allow for reading discontinuous columns. So any alternative?

Ape Ape
Answer Source

The read.xlsx function from openxlsx package has a parameter cols that takes a numeric index that specifies which columns to read.

It seems to read all columns as characters if at least one column is character.

Edit: For .xls files, see the XLConnect package. Installing rJava might be tricky, thought. The keep and drop parameters of readWorksheet() accept column names too. Parameter colTypes deals with column types. This way it works for me:

options(java.home = "C:\\Program Files\\Java\\jdk1.8.0_74\\")
library(rJava)
library(XLConnect)
workbook <- loadWorkbook("test.xls")
readWorksheet(workbook, sheet = "Sheet0", keep = c(1,2,5))
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download