Rubyx Rubyx - 9 months ago 53
Ruby Question

How avoid interval with Mechanize

I'm trying to scrape Craiglist with Mechanize. I code this:

require 'mechanize'

a =
page = a.get("")
i = 0
list_per_page = 99
while i <= list_per_page do
title =".hdrlnk")[i].text
price =".price")[i].text
puts title
puts price
puts "-----------"

It works but when a listing hasn't any price there is an interval. I think it's because I use
but I don't know what I have to do to avoid interval. Any idea?


On Craiglist there is:

listing_title1 -> $100
listing_title2 -> $200
listing_title3 ->
listing_title4 -> $60
listing_title5 -> $150

My output CSV displays:

listing_title1 -> $100
listing_title2 -> $200
listing_title3 -> $60
listing_title4 -> $150
listing_title5 -> $300

$300 is listing_title6

Answer Source

If by 'interval' you mean the blank line that is printed when the listing doesn't have a price, you could fix this by making the puts conditional:

puts price unless price.empty?


If I understand right, your hdrlnk and price entries are getting out of sync with each other. This happens because your current loop is skipping entries with blank price fields and going straight to the next one.

The best way to get around this is to find a container that includes both price and hdrlnk and iterate over those instead of over the hdrlnk and price entries separately. On this page that would be the .row which contains all the info for each search result. So something like this would work:".row").each do |row|
  title =".hdrlnk").first
  price =".price").first
  puts title.text if title
  puts price.text if price
  puts "------------"