srboisvert srboisvert - 9 days ago 5
Ruby Question

Web Scraping with Nokogiri::HTML and Ruby - How do you handle when what you are looking for isn't there?

I've got a script that works for 99% of the pages I want to scrape but just a few of them don't have what I am looking for and my script errors out with a

undefined method `attribute' for nil:NilClass (NoMethodError)


The code is a bit ugly from fiddling around and debugging but here is what I am doing. The error is on the third line and is simply because in the error cases there is no .entry-content img:

doc = Nokogiri::HTML(open(url))
image_link = doc.css(".entry-content img")
temp = image_link.attribute('src').to_s


How can I detect this and handle the error when the image_link returned by Nokogiri isn't nil?

Answer
doc = Nokogiri::HTML(open(url))
if image_link = doc.at_css(".entry-content img")
  temp = image_link['src']
else
  # Whatever else
end

Alternatively, you could use an XPath selector to get the attribute value directly:

doc = Nokogiri::HTML('<div class="entry-content"><img src="bar"></div>')
src = doc.at_xpath('//*[@class="entry-content"]//img/@src').to_s
# src is "bar"; if the html didn't have such an item, it would be "" (nil.to_s)