I'm trying to read a zipped folder called etfreit.zip contained in Purchases from April 2016 onward.
Inside the zipped folder is a file called 2016.xls which is difficult to read as it contains empty rows along with Japanese text.
I have tried various ways of reading the xls from R, but I keep getting errors. This is the code I tried:
data <- read.csv(text=readLines("2016.xls")[-(1:10)])
There is more than one bizzare thing going on here I think, but I had some success with (somewhat older)
data = gdata::read.xls("2016.xls")
By the way, treating xls file as csv seldom works. Actually it shouldn't work at all :) Find out a proper import function for your type of data and then use it, don't assume that
read.csv is going to take care about anything else than csv (properly).
As per your comment: I'm not sure what you mean by "not properly aligned", but here is some code that cleans the data a bit, and gives you numeric variables instead of factors (note I'm using
tidyr for that):
data2 = data[-c(1:7), -c(1, 6)] names(data2) = c("date", "var1", "var2", "var3") data2[, c(2:4)] = sapply(data2[, c(2:4)], tidyr::extract_numeric) # Optionally convert the column with factor dates to Posixct data2$date = as.POSIXct(data2$date)
Also, note that I am removing only 7 upper rows - this seems to be the portion of the data that contains the header with Japanese.