Jackson Cunningham Jackson Cunningham - 2 months ago 20
Ruby Question

How to get all elements via CSS class

I am trying to scrape this page using Nokogiri to get all the elements with class name of "teaser".

If I check the page with jQuery, I can see there are 25 elements:

$(".teaser").length => 25


However, when using Nokogiri, I only get the first teaser:

teasers = doc.css('.teaser')
teasers.count => 1


Where am I going wrong? How do I get all the teasers?

Answer

That document appears to have a load of null bytes in it for some reason, and this is causing Nokogiri/LibXML to assume the document has finished part way through.

You should be able to fix it by preprocessing the contents to remove the nulls. If page contains the text of the webpage:

page.gsub! /\x00/, ''

Then use Nokogiri on page as before.