pingu pingu - 5 months ago 11
Ruby Question

Using Mechanize gem to return a collection of links based on their position in the DOM

I am struggling with mechanize. I wish to "click" on a set of links which can only be identified by their position (all links within div#content) or their href.

I have tried both of these identification methods above without success.

From the documentation, I could not figure out how return a collection of links (for clicking) based on their position in the DOM, and not by attributes directly on the link.

Secondly, the documentation suggested you can you use :href to match a partial href,

page = agent.get('http://foo.com/').links_with(:href => "/something")


but the only way I can get it to return a link is by passing a fully qualified URL, e.g

page = agent.get('http://foo.com/').links_with(:href => "http://foo.com/something/a")


This is not very usefull if i want to return a collection of links with href's

http://foo.com/something/a
http://foo.com/something/b
http://foo.com/something/c
etc...


Am I doing something wrong? do I have unrealistic expectations?

Answer

Part II The value you pass to :href has to be an exact match by default. So the href in your example would only match <a href="/something"></a> and not <a href="foo.com/something/a"></a>

What you want to do is to pass in a regex so that it will match a substring within the href field. Like so:

page = agent.get('http://foo.com/').links_with(:href => %r{/something/})

edit: Part I In order to get it to select links only in a link, add a nokogiri-style search method into your string. Like this:

page = agent.get('http://foo.com/').search("div#content").links_with(:href => %r{/something/})    # **

Ok, that doesn't work because after you do page = agent.get('http://foo.com/').search("div#content") you get a Nokogiri object back instead of a mechanize one, so links_with won't work. However you will be able to extract the links from the Nokogiri object using the css method. I would suggest something like:

page = agent.get('http://foo.com/').search("div#content").css("a")

If that doesn't work, I'd suggest checking out http://nokogiri.org/tutorials

Comments