Jeanne Diderot Jeanne Diderot - 10 months ago 91
JSON Question

Load an element with python from large json file

So, here is my json file. I want to load the data list from it, one by one, and only it. And then, for exemple plot it...

This is an exemple, because I am dealing with large data set, with wich I could not load all the file (that would create a memory error).

"earth": {
"europe": [
{"name": "Paris", "type": "city"},
{"name": "Thames", "type": "river"},
{"par": 2, "data": [1,7,4,7,5,7,7,6]},
{"par": 2, "data": [1,0,4,1,5,1,1,1]},
{"par": 2, "data": [1,0,0,0,5,0,0,0]}
"america": [
{"name": "Texas", "type": "state"}

Here is what I tried:

import ijson
filename = "testfile.json"

f = open(filename)
mylist = ijson.items(f, 'earth.europe[2].data.item')
print mylist

It returns me nothing, even when I try to convert it into a list:


Answer Source

You need to specify a valid prefix; ijson prefixes are either keys in a dictionary or the word item for list entries. You can't select a specific list item (so [2] doesn't work).

If you wanted all the data keys dictionaries in the europe list, then the prefix is:
# ^ ------------------- outermost key must be 'earth'
#       ^ ------------- next key must be 'europe'
#              ^ ------ any value in the array
#                   ^   the value for the 'data' key

This produces each such list:

>>> l = ijson.items(f, '')
>>> for data in l:
...     print data
[1, 7, 4, 7, 5, 7, 7, 6]
[1, 0, 4, 1, 5, 1, 1, 1]
[1, 0, 0, 0, 5, 0, 0, 0]

You can't put wildcards in that, so you can't get earth.* for example.

If you need to do more complex prefixing matching, you'd have to use the ijson.parse() function and handle the events this produces. You can reuse the ijson.ObjectBuilder() class to turn events you are interested in into Python objects:

parser = ijson.parse(f)
for prefix, event, value in parser:
    if event != 'start_array':
    if prefix.startswith('earth.') and prefix.endswith(''):
        continent = prefix.split('.', 2)[1]
        builder = ijson.ObjectBuilder()
        builder.event(event, value)
        for nprefix, event, value in parser:
            if (nprefix, event) == (prefix, 'end_array'):
            builder.event(event, value)
        data = builder.value
        print continent, data

This will print every array that's in a list under a 'data' key (so lives under a prefix that ends with ''), with the 'earth' key. It also extracts the continent key.