I am trying to scrape this page using Nokogiri to get all the elements with class name of "teaser".
If I check the page with jQuery, I can see there are 25 elements:
$(".teaser").length => 25
teasers = doc.css('.teaser')
teasers.count => 1
That document appears to have a load of null bytes in it for some reason, and this is causing Nokogiri/LibXML to assume the document has finished part way through.
You should be able to fix it by preprocessing the contents to remove the nulls. If
page contains the text of the webpage:
page.gsub! /\x00/, ''
Then use Nokogiri on
page as before.