Rilcon42 Rilcon42 - 1 month ago 20
HTML Question

finding correct xpath for a table without an id

I am following a tutorial on R-Bloggers using rvest to scrape table. I think I have the wrong column id value, but I don't understand how to get the correct one. Can someone explain what value I should use, and why?

library("rvest")
interest<-url("http://online.wsj.com/mdc/public/page/2_3020-libor.html")%>%read_html()%>%html_nodes(xpath='//*[@id="column0"]/table[1]') %>% html_table()


The structure returns is an empty list.

Answer

For me it is usual a trial and error to find the correct table. In this case, the third table is what you are looking for:

library("rvest")
page<-url("http://online.wsj.com/mdc/public/page/2_3020-libor.html")%>%read_html()
tables<-html_nodes(page, "table") 
html_table(tables[3])

Instead of using the xpath, I just parse out the "table" tag and looked at each table to locate the correct one. The piping command is handy but it makes it harder to debug when something goes wrong.