rjdel rjdel - 1 year ago 77
JSON Question

JSON.Dump doesn't capture the whole stream

So I have a simple crawler that crawls 3 store location pages and parses the locations of the stores to json. I print(app_data['stores']) and it prints all three pages of stores. However, when I try to write it out I only get one of the three pages, at random, written to my json file. I'd like everything that streams to be written to the file. Any help would be great. Here's the code:

import scrapy
import json
import js2xml

from pprint import pprint

class StlocSpider(scrapy.Spider):
name = "stloc"
allowed_domains = ["bestbuy.com"]
start_urls = (

def parse(self, response):
js = response.xpath('//script[contains(.,"window.appData")]/text()').extract_first()
jstree = js2xml.parse(js)
# print(js2xml.pretty_print(jstree))

app_data_node = jstree.xpath('//assign[left//identifier[@name="appData"]]/right/*')[0]
app_data = js2xml.make_dict(app_data_node)

for store in app_data['stores']:
yield store

with open('stores.json', 'w') as f:
json.dump(app_data['stores'], f, indent=4)

Answer Source

You are opening the file for writing every time, but you want to append. Try changing the last part to this:

with open('stores.json', 'a') as f:
    json.dump(app_data['stores'], f, indent=4)

Where 'a' opens the file for appending.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download