rogal111 rogal111 - 5 months ago 28
Javascript Question

:has CSS pseudo class in Nokogiri

I'm looking for the pseudoclass

:has
in Nokogiri.
It should work just like jQuery's
has
selector
.

For example:

<li><h1><a href="dfd">ex1</a></h1><span class="string">sdfsdf</span></li>
<li><h1><a href="dsfsdf">ex2</a></h1><span class="string"></span></li>
<li><h1><a href="sdfd">ex3</a></h1></li>


The CSS selector should return only the first link, the one with the not-empty
span.string
sibling.

In jQuery this selector works well:

$('li:has(span.string:not(:empty))>h1>a')


but not in Nokogiri:

Nokogiri::HTML(html_source).css('li:has(span.string:not(:empty))>h1>a')


:not
and
:empty
works well, but not
:has
.





  1. Is there any documentation for CSS selectors in Nokogiri?

  2. Maybe someone can write a custom
    :has
    pseudo class? Here is an example how to write a
    :regexp
    selector.

  3. Optionally I can use XPath. How do I write XPath for
    li:has(span.string:not(:empty))>h1>a
    ?


Answer

The problem with Nokogiri's current implementation of :has() is that it creates XPath that requires the contents to be a direct child, not any descendant:

puts Nokogiri::CSS.xpath_for( "a:has(b)" )
#=> "//a[b]"
#=> Should output "//a[.//b]" to be correct

To make this XPath match what jQuery does, you need to allow the span to be a descendant element. For example:

require 'nokogiri'
d = Nokogiri.XML('<r><a/><a><b><c/></b></a></r>')
d.at_css('a:has(b)')    #=> #<Nokogiri::XML::Element:0x14dd608 name="a" children=[#<Nokogiri::XML::Element:0x14dd3e0 name="b" children=[#<Nokogiri::XML::Element:0x14dd20c name="c">]>]>
d.at_css('a:has(c)')    #=> nil
d.at_xpath('//a[.//c]') #=> #<Nokogiri::XML::Element:0x14dd608 name="a" children=[#<Nokogiri::XML::Element:0x14dd3e0 name="b" children=[#<Nokogiri::XML::Element:0x14dd20c name="c">]>]>

For your specific case, here's the full "broken" XPath:

puts Nokogiri::CSS.xpath_for( "li:has(span.string:not(:empty)) > h1 > a" )
#=> //li[span[contains(concat(' ', @class, ' '), ' string ') and not(not(node()))]]/h1/a

And here it is fixed:

# Adding just the .//
//li[.//span[contains(concat(' ', @class, ' '), ' string ') and not(not(node()))]]/h1/a

# Simplified to assume only one CSS class is present on the span
//li[.//span[@class='string' and not(not(node()))]]/h1/a

# Assuming that `not(:empty)` really meant "Has some text in it"
//li[.//span[@class='string' and text()]]/h1/a

# ..or maybe you really wanted "Has some text anywhere underneath"
//li[.//span[@class='string' and .//text()]]/h1/a

# ..or maybe you really wanted "Has at least one element child"
//li[.//span[@class='string' and *]]/h1/a
Comments