I would like to web scrape past dates for Non Farm Payrolls from here http://www.bls.gov/bls/archived_sched.htm (archive) and here http://www.bls.gov/schedule/news_release/empsit.htm (current year).
Something similar was achieved by Peter Chan for FOMC dates here: https://github.com/returnandrisk/r-code/blob/master/FOMC%20Dates%20-%20Scraping%20Data%20From%20Web%20Pages.R. This is his code:
install.packages(c("httr", "XML"), repos = "http://cran.us.r-project.org")
# get and parse web page content
webpage <- content(GET("http://www.federalreserve.gov/monetarypolicy/fomccalendars.htm"), as="text")
xhtmldoc <- htmlParse(webpage)
# get statement urls and sort them
statements <- xpathSApply(xhtmldoc, "//td[@class='statement2']/a", xmlGetAttr, "href")
statements <- sort(statements)
# get dates from statement urls
fomcdates <- sapply(statements, function(x) substr(x, 28, 35))
fomcdates <- as.Date(fomcdates, format="%Y%m%d")
# save results in working directory
save(list = c("statements", "fomcdates"), file = "fomcdates.RData")
This works for the current year.
library(rvest) url <- 'http://www.bls.gov/schedule/news_release/empsit.htm' ses <- html_session(url) tbl <- html_table(ses, fill = T) nfpdates <- tbl[]$`Release Date` nfpdates <- gsub('\\.', '', nfpdates) nfpdates <- as.Date(nfpdates, '%b %d, %Y')