Jeanne Diderot Jeanne Diderot - 1 month ago 21
JSON Question

Load an element with python from large json file

So, here is my json file. I want to load the data list from it, one by one, and only it. And then, for exemple plot it...

This is an exemple, because I am dealing with large data set, with wich I could not load all the file (that would create a memory error).

{
"earth": {
"europe": [
{"name": "Paris", "type": "city"},
{"name": "Thames", "type": "river"},
{"par": 2, "data": [1,7,4,7,5,7,7,6]},
{"par": 2, "data": [1,0,4,1,5,1,1,1]},
{"par": 2, "data": [1,0,0,0,5,0,0,0]}
],
"america": [
{"name": "Texas", "type": "state"}
]
}
}


Here is what I tried:

import ijson
filename = "testfile.json"

f = open(filename)
mylist = ijson.items(f, 'earth.europe[2].data.item')
print mylist


It returns me nothing, even when I try to convert it into a list:

[]

Answer

You need to specify a valid prefix; ijson prefixes are either keys in a dictionary or the word item for list entries. You can't select a specific list item (so [2] doesn't work).

If you wanted all the data keys dictionaries in the europe list, then the prefix is:

earth.europe.item.data
# ^ ------------------- outermost key must be 'earth'
#       ^ ------------- next key must be 'europe'
#              ^ ------ any value in the array
#                   ^   the value for the 'data' key

This produces each such list:

>>> l = ijson.items(f, 'earth.europe.item.data')
>>> for data in l:
...     print data
...
[1, 7, 4, 7, 5, 7, 7, 6]
[1, 0, 4, 1, 5, 1, 1, 1]
[1, 0, 0, 0, 5, 0, 0, 0]

You can't put wildcards in that, so you can't get earth.*.item.data for example.

If you need to do more complex prefixing matching, you'd have to use the ijson.parse() function and handle the events this produces. You can reuse the ijson.ObjectBuilder() class to turn events you are interested in into Python objects:

parser = ijson.parse(f)
for prefix, event, value in parser:
    if event != 'start_array':
        continue
    if prefix.startswith('earth.') and prefix.endswith('.item.data'):
        continent = prefix.split('.', 2)[1]
        builder = ijson.ObjectBuilder()
        builder.event(event, value)
        for nprefix, event, value in parser:
            if (nprefix, event) == (prefix, 'end_array'):
                break
            builder.event(event, value)
        data = builder.value
        print continent, data

This will print every array that's in a list under a 'data' key (so lives under a prefix that ends with '.item.data'), with the 'earth' key. It also extracts the continent key.