Allessandro PT Allessandro PT - 1 year ago 113
R Question

Using rvest to scrape a website - Selecting html node?

I have a question about my latest r vest scrape.

I want to scrape this page (and some other stocks as well):

I need a list of the Market Capital, which is the first box in the second line.
This list should contain approx 50-100 stocks.

I am using rvest for that.


html = read_html("")

cast = html_nodes(html, "table-dark-row")

The problem is, I can not get around the html_nodes.
Any idea about how to find out the correct node for the html_nodes?

I am using firebug/firefinder to check out the webpage.

Answer Source

Not sure if this is what you want because I cannot find a list with aprox. 50-100 stocks.

But for what is worth, using SelectorGadget I came up with this node .table-dark-row:nth-child(2) .snapshot-td2:nth-child(2), to select the Market Cap (first box in the second line of this page

> library(rvest)
> html = read_html("")
> cast = html_nodes(html, ".table-dark-row:nth-child(2) .snapshot-td2:nth-child(2)")
> cast
{xml_nodeset (1)}
[1] <td width="8%" class="snapshot-td2" align="left">\n  <b>11.58B</b>\n</td>

If this is not exactly what you want, just use SelectorGadget to locate what you want.

Hope this helps.


Here complete solution:


html = read_html("")

cast = html_nodes(html, ".table-dark-row:nth-child(2) .snapshot-td2:nth-child(2)")

html_text(cast) %>%
    gsub(pattern = "B", replacement = "") %>%
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download