Nathan Nathan - 3 months ago 19
HTML Question

When scraping with rvest expected html_node not appearing

The ITTO website produces a table of timber products and flows directly under the search form once the query is submitted (on the same page). Using information I obtained from Chrome's SelectorGadget I'm expecting the table to appear as the css element "td". Using rvest to scrape information on Albania for 2014...

library(rvest)

session <- html_session("http://www.itto.int/annual_review_output/?mode=searchdata")
form <- html_form(session)[[2]]
form <- set_values(form, "countries[]" = "8", "products[]" = "1" ,"flows[]" = "1", "years[]" = "2014")
query <- submit_form(session, form, submit = NULL)
page <- read_html(query) %>% html_nodes("td")
page


Which results in the table "td" being absent:

{xml_nodeset (0)}


Examining other elements of the page with html_nodes() suggests that submit_form() performed otherwise as expected.

So my question is where is the expected table?

Answer

It might be easier (in the long run) to scrape the select box options and just feed the POST call directly:

library(httr)
library(rvest)

res <- POST(url = "http://www.itto.int/annual_review_output/?mode=searchdata",
            body = list(`countries[]` = "76", 
                        `products[]` = "1", `flows[]` = "1", 
                        `years[]` = "2014"), 
            encode = "form")

pg <- content(res, as="parsed")
html_nodes(pg, "td")

## {xml_nodeset (7)}
## [1] <td>Brazil</td>
## [2] <td>Ind. roundwood</td>
## [3] <td>Exports Quantity</td>
## [4] <td>1000 m3</td>
## [5] <td>2014</td>
## [6] <td style="text-align:right;">204.59</td>
## [7] <td>I</td>
Comments