AI52487963 AI52487963 - 1 year ago 208
R Question

Parsing XML with no style information?

I'm trying to do a simple XML parse from web, but I seem to be hitting some roadblocks. If I try to do a classic XML parse:

url <- c("")
xml <- xmlTreeParse(url, encoding = "UTF-8", isURL=TRUE)

I get:

Unknown encoding "UTF-8"
Error: 1: Unknown encoding "UTF-8"

Even though it seems like I specified the encoding already. Looking at the XML from the site, it says across the top that it doesn't have any style information, but displays the document tree anyway. Then, if I try to do an htmlParse instead,

file <- htmlTreeParse(url, encoding = "UTF-8", isURL=TRUE)

I get:

Error in which(value == defs) :
argument "code" is missing, with no default

Is there something obvious I'm missing here?

Answer Source

You may find it easier in the long run to move to rvest and xml2:


pg <- read_xml("")

xml_nodes(pg, xpath="//name") %>% xml_text()

xml_nodes(pg, xpath="//description") %>% xml_text()

xml_nodes(pg, xpath="//boardgamehonor") %>% xml_text()

xml_nodes(pg, xpath="//name[@primary='true' and @sortindex=1]") %>% xml_text()
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download