Michael P. Michael P. - 1 month ago 51
Ruby Question

My attempts at building the simplest web crawler w/Capybara are failing. What am I doing wrong?

[Warning: Ranting ahead. Please don't edit out the ranting. I want to know whether what I'm encoutnering is normal. These sorts of obstacles absolutely ruin my day as a developer. They're more difficult than trying to solve for whatever business problem I've set out to solve.]

Capybara. Mechanize. Nokogiri. Selenium. Et cetera.

I've tried to build the simplest little Ruby program that does the following:


  1. Opens a web browser

  2. Navigates to a website

  3. Clicks a link



. . . but have had basically no success.**

Here's what I've tried:

crawler.rb

require "capybara"
require "capybara/dsl"

class Crawler
include Capybara::DSL

def initialize
visit "http://www.google.com"
end
end

crawler = Crawler.new


When I run that code, I get an error.

rack-test requires a rack application, but none was given (ArgumentError)


I read somewhere not in the documentation that this should fix it:

require "capybara"
require "capybara/dsl"

class Crawler
include Capybara::DSL

def initialize
Capybara.default_driver = :selenium
visit "http://www.google.com"
end
end

crawler = Crawler.new


Then, when I solve for that error, I get another related to some other dependency.

Unable to find Mozilla geckodriver. Please download the server from https://github.com/mozilla/geckodriver/releases and place it somewhere on your PATH. More info at https://developer.mozilla.org/en-US/docs/Mozilla/QA/Marionette/WebDriver. (Selenium::WebDriver::Error::WebDriverError)


I download the driver, have no clue as to how to actually install the thing despite reading and following another set of elliptical directions, but already have the distinct sense that I'm down a path of yak-shaving that won't yield any fruit, because all I want to do is get Ruby to go to a stupid web page and click a stupid link.

I'm not trying to run this code as part of a test. I literally just want Ruby to open a web browser (that I can see) using Capybara (or whatever tool gets the job done, though preferably Capybara) and to do my bidding. But this for whatever reason is EXTREMELY difficult, even though it's apparently been done a billion times.

Guys/Gals, what am I doing wrong here? It's stuff like this that chews up way too much time whenever I'm trying to so much as test a simple idea.

** It's absolutely infuriating--especially so because you'd think it would be as straightforward as following a given gem's documentation. But, generally speaking, I found that gems are elliptically documented. About 90% of the time, I have to go to Stackoverflow or google someone's tutorial in order to learn how to do the most basic shit with popular gems like the ones above, because they seldom just work. There's just about always some crazy gymnastics that, were it not for the assistance of others, I'd have zero clue as to how to overcome.

Sorry--that's just a general gripe about open-source software. I'm not even a junior developer, and I find sometimes that I need to spend HOURS just getting a gem to do whatever basic thing it's supposed to do.

Answer

selenium-webdriver recently released 3.0.0 which defaults to using geckodriver with firefox (which Capybara defaults to), but has some missing functionality in that combination. Rather I would recommend using it with chrome and chromedriver for your use case. You will need to download the latest version of chromedriver and put it somewhere in your PATH. Then

require "capybara/dsl"
require "selenium-webdriver"

Capybara.register_driver :crawler_driver do |app|
  Capybara::Selenium::Driver.new(app, :browser => :chrome)
end
Capybara.default_driver = :crawler_driver

class Crawler
  include Capybara::DSL

  def initialize
    visit "http://www.google.com"
  end
end

crawler = Crawler.new

should do what you're trying to do. You're going to have issues as soon as you create another Crawler instance though since they will both be using the same Capybara session and conflict. If you're not going to be creating multiple instance then you're fine, if you are then you'll want to create a new Capybara::Session in each instance of crawler and call all capybara methods on that session object rather than including Capybara::DSL into your object which would be more like this

class Crawler
  def initialize
    @session = Capybara::Session.new(:crawler_driver)
    @session.visit "http://www.google.com"
  end
end