Vanessa Vanessa - 9 months ago 25
R Question

xml-tei in R: exclude a value from attribute with multiple values

I have multiple values in attribute @ana in a xml-tei file:

<!-- xml-tei -->
<w type="verb" ana="#ŠNS01 #destruction #action #ANT" />


In R, I want to count some of the @ana values:

#in R
nodes=getNodeSet(doc,"//ns:w[contains(@type,'verb') and contains(@ana,'#action') and contains(@ana, '#destruction')]", ns)
total_actionDes <- length(nodes)
total_actionDes


But it also count @ana="ANT" and I don't want to.

How to exclude this value from getNodesSet?

In advance, thank you.

Answer Source

You can use the not() function:

library(XML)
doc <- xmlParse('<w type="verb" ana="#SNS01 #destruction #action #ANT"/>', asText = TRUE)
getNodeSet(doc,"//w[contains(@type,'verb') and contains(@ana,'#action') and contains(@ana, '#destruction') and not(contains(@ana, 'ANT'))]")
# list()
# attr(,"class")
# [1] "XMLNodeSet"