Frank Frank - 3 months ago 24
R Question

R XML xpath queries return NULL or list()

I would like to extract data as dataframes from an XML file available under: http://www.uniprot.org/uniprot/P43405.xml

I only get back empty string although I think that the xpath queries are okay.

library(RCurl)
library(XML)
url <- "http://www.uniprot.org/uniprot/P43405.xml"
urldata <- getURL(url)
xmlfile <- xmlParse(urldata)

# some xpath queries
xmlfile["//entry/comment[@type='function']/text"]
xmlfile["//entry/comment[@type='PTM']/text"]

xpathSApply(xmlfile,"//uniprot/entry",xmlGetAttr, 'dataset')
xpathSApply(xmlfile,"//uniprot/entry",xmlValue)


Can anyone help me with this problem?

Thanks, Frank

Answer

Namespaces are missing:

library(RCurl)
library(XML)

url <- "http://www.uniprot.org/uniprot/P43405.xml"
urldata <- getURL(url)
xmlfile <- xmlParse(urldata)

getNodeSet(xmlfile, "//entry//comment")
namespaces <- c(ns="http://uniprot.org/uniprot")
getNodeSet(xmlfile, "//ns:entry//ns:comment", namespaces)

getNodeSet(xmlfile, "//ns:entry//ns:comment[@type='PTM']/ns:text", namespaces)

xpathSApply(xmlfile,"//ns:uniprot/ns:entry",xmlGetAttr, 'dataset', namespaces=namespaces)
xpathSApply(xmlfile,"//ns:uniprot/ns:entry",xmlValue, namespaces=namespaces)

References:

?xpathApply

How can I use xpath querying using R's XML library?