Henley Chiu Henley Chiu - 1 month ago 18
Ruby Question

Why does .css work with this Nokogiri object but not XPath?

Why does the CSS selector return the correct info, but the XPath does not?

source = "<hgroup class='page-header channel post-head' data-channel='tech' data-section='sec0=tech&amp;sec1=index&amp;sec2='><h2>Tech</h2></hgroup>"

doc = Nokogiri::HTML(source)
doc.xpath('//hgroup[case_insensitive_equals(@class,"post-head")]//h2', XpathFunctions.new)
=> []

=> [#<Nokogiri::XML::Element:0x6c2b824 name="h2" children=[#<Nokogiri::XML::Text:0x6c2b554 "Tech">]>]


Assuming case_insensitive_equals does what its name suggests, it is because the class attribute isn’t equal to post-head (case insensitively or not), but it does contain it. XPath treats class attributes as plain strings, it doesn’t split them and handle the classes individually as CSS does.

A simple XPath that would work would be:

doc.xpath('//hgroup[contains(@class, "post-head")]//h2')

(I’ve removed the custom function, you will need to write your own to do this case insensitively.)

This isn’t quite the same though, as it will also match classes such as not-post-head. A more complete XPath would be something like this:

doc.xpath('//hgroup[contains(concat(" ", normalize-space(@class), " "), " post-head ")]//h2')