So I have a simple crawler that crawls 3 store location pages and parses the locations of the stores to json. I print(app_data['stores']) and it prints all three pages of stores. However, when I try to write it out I only get one of the three pages, at random, written to my json file. I'd like everything that streams to be written to the file. Any help would be great. Here's the code:
from pprint import pprint
name = "stloc"
allowed_domains = ["bestbuy.com"]
start_urls = (
def parse(self, response):
js = response.xpath('//script[contains(.,"window.appData")]/text()').extract_first()
jstree = js2xml.parse(js)
app_data_node = jstree.xpath('//assign[left//identifier[@name="appData"]]/right/*')
app_data = js2xml.make_dict(app_data_node)
for store in app_data['stores']:
with open('stores.json', 'w') as f:
json.dump(app_data['stores'], f, indent=4)
You are opening the file for writing every time, but you want to append. Try changing the last part to this:
with open('stores.json', 'a') as f: json.dump(app_data['stores'], f, indent=4)
'a' opens the file for appending.