Iwo Iwo - 6 months ago 29
Ruby Question

Mechanize won't conect to site

Welcome, I got a problem, gem mechanize won't connect to a site. Gem is installed.
Code:

require 'mechanize'

agent = Mechanize.new
main_page = agent.get 'https://imbd.com'
main_page.link_with(text: "Top 250").click
rows = list_page.root.css(".lister-list tr")

puts rows.size


And this is an error:

C:/Ruby/lib/ruby/2.2.0/net/http.rb:879:in `initialize': A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond. - connect(2) for "imbd.com" port 80 (Errno::ETIMEDOUT)
from C:/Ruby/lib/ruby/2.2.0/net/http.rb:879:in `open'
from C:/Ruby/lib/ruby/2.2.0/net/http.rb:879:in `block in connect'
from C:/Ruby/lib/ruby/2.2.0/timeout.rb:73:in `timeout'
from C:/Ruby/lib/ruby/2.2.0/net/http.rb:878:in `connect'
from C:/Ruby/lib/ruby/2.2.0/net/http.rb:863:in `do_start'
from C:/Ruby/lib/ruby/2.2.0/net/http.rb:858:in `start'
from C:/Ruby/lib/ruby/gems/2.2.0/gems/net-http-persistent-2.9.4/lib/net/http/persistent.rb:700:in `start'
from C:/Ruby/lib/ruby/gems/2.2.0/gems/net-http-persistent-2.9.4/lib/net/http/persistent.rb:631:in `connection_for'
from C:/Ruby/lib/ruby/gems/2.2.0/gems/net-http-persistent-2.9.4/lib/net/http/persistent.rb:994:in `request'
from C:/Ruby/lib/ruby/gems/2.2.0/gems/mechanize-2.7.4/lib/mechanize/http/agent.rb:267:in `fetch'
from C:/Ruby/lib/ruby/gems/2.2.0/gems/mechanize-2.7.4/lib/mechanize.rb:464:in `get'
from C:/Ruby/Workspace/imbd.rb:4:in `<main>'


Anyone has any idea what's wrong? Thanks!

Answer

While it's true that mechanize doesn't support javascript, your problem is that you are trying to access a site that doesn't exist. You are trying to access www.imbd.com instead of www.imdb.com. So, the error message is accurate.

And FWIW, IMDB doesn't want you to scrape their site:

Robots and Screen Scraping: You may not use data mining, robots, screen scraping, or similar data gathering and extraction tools on this site, except with our express written consent as noted below.

Comments