Gracos Gracos - 1 year ago 100
HTML Question

Web Scrape Non Farm Payrolls dates in R

I would like to web scrape past dates for Non Farm Payrolls from here (archive) and here (current year).

Something similar was achieved by Peter Chan for FOMC dates here: This is his code:

install.packages(c("httr", "XML"), repos = "")


# get and parse web page content
webpage <- content(GET(""), as="text")
xhtmldoc <- htmlParse(webpage)
# get statement urls and sort them
statements <- xpathSApply(xhtmldoc, "//td[@class='statement2']/a", xmlGetAttr, "href")
statements <- sort(statements)
# get dates from statement urls
fomcdates <- sapply(statements, function(x) substr(x, 28, 35))
fomcdates <- as.Date(fomcdates, format="%Y%m%d")
# save results in working directory
save(list = c("statements", "fomcdates"), file = "fomcdates.RData")

I would like to replicate that for NFP. Just as fomcdates contains all FOMC dates, I would like to create NFPdates containing all NFP dates.

Would anyone know how to do so for the current year only? (asking current year as it seems to be the simplest). Thank you.

Answer Source

This works for the current year.


url <- ''
ses <- html_session(url)
tbl <- html_table(ses, fill = T) 
nfpdates <- tbl[[2]]$`Release Date`
nfpdates <- gsub('\\.', '', nfpdates)
nfpdates <- as.Date(nfpdates, '%b %d, %Y')