boson boson - 5 months ago 149
JSON Question

Memory issues while parsing json file in ijson

This tutorial https://www.dataquest.io/blog/python-json-tutorial/ has a 600MB file that they work with, however when I run their code

import ijson

filename = "md_traffic.json"
with open(filename, 'r') as f:
objects = ijson.items(f, 'meta.view.columns.item')
columns = list(objects)


I'm running into 10+ minutes of waiting for the file to be read into ijson and I'm really confused how this is supposed to be reasonable. Shouldn't there be parsing? Am I missing something?

Answer

This looks like a direct copy/paste of the tutorial found here:

https://www.dataquest.io/blog/python-json-tutorial/

The reason it's taking so long is the list() around the output of the ijson.items function. This effectively forces parsing of the entire file before returning any results. Taking advantage of the ijson.items being a generator, the first result can be returned almost immediately:

import ijson

filename = "md_traffic.json"
with open(filename, 'r') as f:
    for item in ijson.items(f, 'meta.view.columns.item'):
        print(item)
        break

EDIT: The very next step in the tutorial is print(columns[0]), which is why I included printing the first item in the answer. Also, it's not clear whether the question was for Python 2 or 3, so the answer uses syntax that works in both, albeit inelegantly.

Comments