AI52487963 AI52487963 - 7 months ago 125
R Question

Parsing XML with no style information?

I'm trying to do a simple XML parse from web, but I seem to be hitting some roadblocks. If I try to do a classic XML parse:

url <- c("")
xml <- xmlTreeParse(url, encoding = "UTF-8", isURL=TRUE)

I get:

Unknown encoding "UTF-8"
Error: 1: Unknown encoding "UTF-8"

Even though it seems like I specified the encoding already. Looking at the XML from the site, it says across the top that it doesn't have any style information, but displays the document tree anyway. Then, if I try to do an htmlParse instead,

file <- htmlTreeParse(url, encoding = "UTF-8", isURL=TRUE)

I get:

Error in which(value == defs) :
argument "code" is missing, with no default

Is there something obvious I'm missing here?


You may find it easier in the long run to move to rvest and xml2:


pg <- read_xml("")

xml_nodes(pg, xpath="//name") %>% xml_text()

xml_nodes(pg, xpath="//description") %>% xml_text()

xml_nodes(pg, xpath="//boardgamehonor") %>% xml_text()

xml_nodes(pg, xpath="//name[@primary='true' and @sortindex=1]") %>% xml_text()