Tchotchke Tchotchke - 4 months ago 8
JSON Question

Reading in string of nested JSON lists and dictionaries with Python

I am having trouble reading data in python that I'm piping from a large

.gz
file. A sample of one of the rows is:

foo_brackets='{"KEY2":[{"KEY2a":[{"KEY2a1":"4","KEY2a2":"5"},{"KEY2a1":"6","KEY2a2":"7"}],"KEY2b":"8"}],"KEY3":"9"}'


When I load with
json
, the value for
KEY2
is read in as a list, because of the brackets, which then prevents me from getting at my desired result, which is the value of
KEY2b
:

>>> import json
>>> foo_brackets_json=json.loads(foo_brackets)
>>> foo_brackets_json['KEY2']['KEY2b']
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: list indices must be integers, not str


I could just try to remove the brackets, but there actually is a value that should be a list,
KEY2a
. You can see this if I strip out all the brackets and try to convert to JSON:

>>> foo_no_brackets='{"KEY2":{"KEY2a":{"KEY2a1":"4","KEY2a2":"5"},{"KEY2a1":"6","KEY2a2":"7"},"KEY2b":"8"},"KEY3":"9"}'
>>> json.loads(foo_no_brackets)
# Traceback omitted since it's just the python error
ValueError: Expecting property name: line 1 column 45 (char 45)


foo_brackets
does appear to be valid JSON (I tested here, with the quotes removed) and got the following:

{
"KEY2":[
{
"KEY2a":[
{
"KEY2a1":"4",
"KEY2a2":"5"
},
{
"KEY2a1":"6",
"KEY2a2":"7"
}
],
"KEY2b":"8"
}
],
"KEY3":"9"
}


Question:



Using a combination of python or bash, is there a way for me to read objects like
foo_brackets
so that I can call
foo_brackets_json['KEY2']['KEY2b']
?

I mention bash, because since I'm actually piping the contents of a large
.gz
file to a python script for analysis, something like this would work for me:
zcat my_data.gz | (clean up JSON here) | analysis.py
then that would work.

Answer

foo_brackets_json['KEY2'] references a list, here with one element.

You'll have to use integer indices to reference the dictionaries contained in that list:

foo_brackets_json['KEY2'][0]['KEY2b']

Don't try to remove the brackets; there could be 0 or more nested dictionaries here. You'll have to determine what should happen in those cases where you don't have just 1 nested dictionary.

The above hardcoded reference assumes there is always at least one such a dictionary in the list, and doesn't care if there are more than one.

You could use looping to handle the 0 or more case:

for nested in foo_brackets_json['KEY2']:
    print(nested['KEY2b'])

Now you are handling each nested dictionary, one by one. This'll work for the empty list case, and if there is more than one.

You could make having 0 or more than one an error:

if len(foo_brackets_json['KEY2']) != 1:
    raise ValueError('Unexpected number of results')

etc. etc. It all depends on your actual use-case.