Henley Chiu Henley Chiu - 1 year ago 101
Ruby Question

Why does .css work with this Nokogiri object but not XPath?

Why does the CSS selector return the correct info, but the XPath does not?

source = "<hgroup class='page-header channel post-head' data-channel='tech' data-section='sec0=tech&amp;sec1=index&amp;sec2='><h2>Tech</h2></hgroup>"

doc = Nokogiri::HTML(source)
doc.xpath('//hgroup[case_insensitive_equals(@class,"post-head")]//h2', XpathFunctions.new)
=> []

=> [#<Nokogiri::XML::Element:0x6c2b824 name="h2" children=[#<Nokogiri::XML::Text:0x6c2b554 "Tech">]>]

Answer Source

Assuming case_insensitive_equals does what its name suggests, it is because the class attribute isn’t equal to post-head (case insensitively or not), but it does contain it. XPath treats class attributes as plain strings, it doesn’t split them and handle the classes individually as CSS does.

A simple XPath that would work would be:

doc.xpath('//hgroup[contains(@class, "post-head")]//h2')

(I’ve removed the custom function, you will need to write your own to do this case insensitively.)

This isn’t quite the same though, as it will also match classes such as not-post-head. A more complete XPath would be something like this:

doc.xpath('//hgroup[contains(concat(" ", normalize-space(@class), " "), " post-head ")]//h2')
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download