Mughal Walana Mughal Walana - 6 months ago 21
Ruby Question

JSON parsing error in script text ruby

I am trying to get json parsed from the script text that contains stores data.It is inside the page http://www.buildbase.co.uk/storefinder . The script text which i am working around is http://pastebin.com/embed_js/3cnewiSh and my code is as follows:

stores_url = "http://www.buildbase.co.uk/storefinder"
mechanize = Mechanize.new
stores_page = mechanize.get(stores_url)
stores_script_txt = stores_page.search("//script[contains(text(), 'storeLocator.initialize(')]")[0].text
stores_jsons = stores_script_txt.split("storeLocator.initialize( $.parseJSON('{\\\"all\\\":")[-1].split(",\\\"selected\\\":0}') ,\tfalse);\n });")[0]
puts stores_jsons
stores_result = JSON.parse(stores_jsons)


The JSON.parse gives me the error as:

from /home/private/.rvm/gems/ruby-2.1.5/gems/json-1.8.3/lib/json/common.rb:155:in `parse'
from /home/private/.rvm/gems/ruby-2.1.5/gems/json-1.8.3/lib/json/common.rb:155:in `parse'
from (irb):240
from /home/private/.rvm/rubies/ruby-2.1.5/bin/irb:11:in `<main>'


I don't know where i am going wrong because the JSON string seems valid to me.

Answer

There were a couple problems. First, the text you were getting wasn't properly formatted in that it used \" instead of quotes, etc.

Second, it had HTML tags in it, which included quotes, which broke the quoting in the actual JSON. I grabbed a snippet that just strips out the tags.

I don't know how much of the data you need, but this code does work. I am also not sure how robust it is (e.g., I just substituted " for any \")

require 'mechanize'
stores_url = "http://www.buildbase.co.uk/storefinder"
mechanize = Mechanize.new
stores_page = mechanize.get(stores_url)
stores_script_txt = stores_page.search("//script[contains(text(), 'storeLocator.initialize(')]")[0].text
stores_jsons = stores_script_txt.split("storeLocator.initialize( $.parseJSON('{\\\"all\\\":")[-1].split(",\\\"selected\\\":0}') ,\tfalse);\n        });")[0]
stores_jsons = stores_jsons.gsub('\"', '"').gsub(/<\/?[^>]*>/, '').gsub(/\n\n+/, "\n").gsub(/^\n|\n$/, '')
stores_result = JSON.parse(stores_jsons)