Dean Dean - 2 months ago 7
R Question

rbind txt files from online directory (R)

I am trying to get concatenate text files from url but i don't know how to do this with the html and the different folders?

This is the code i tried, but it only lists the text files and has a lot of html code like this How do I fix this so that I can combine the text files into one csv file?

library(RCurl)
url <- "http://weather.ggy.uga.edu/data/daily/"
dir <- getURL(url, dirlistonly = T)
filenames <- unlist(strsplit(dir,"\n")) #split into filenames
#append the files one after another
for (i in 1:length(filenames)) {
file <- past(url,filenames[i],delim='') #concatenate for urly
if (i==1){
cp <- read_delim(file, header=F, delim=',')
}
else{
temp <- read_delim(file,header=F,delim=',')
cp <- rbind(cp,temp) #append to existing file
rm(temp)# remove the temporary file
}
}

Answer

here is a code snippet that I got to work for me. I like to use rvest over RCurl, just because that's what I've learned. In this case, I was able to use the html_nodes function to isolate each file ending in .txt. The result table has the times saved as character strings, but you could fix that later. Let me know if you have any questions.

library(rvest)
library(readr)

url <- "http://weather.ggy.uga.edu/data/daily/"

doc <- xml2::read_html(url)
text <- rvest::html_text(rvest::html_nodes(doc, "tr td a:contains('.txt')"))


# define column types of fwf data ("c" = character, "n" = number)
ctypes <- paste0("c", paste0(rep("n",11), collapse = ""))
data <- data.frame()

for (i in 1:length(text)){
  file <- paste0(url, text[i])

  date <- as.Date(read_lines(file, n_max = 1), "%m/%d/%y")
  data <- rbind(data, cbind(date,read_fwf(file, fwf_empty(file, skip = 3), 
                                       skip = 3, col_types = ctypes)))
}