Tchotchke Tchotchke - 2 months ago 5x
JSON Question

Reading in string of nested JSON lists and dictionaries with Python

I am having trouble reading data in python that I'm piping from a large

file. A sample of one of the rows is:


When I load with
, the value for
is read in as a list, because of the brackets, which then prevents me from getting at my desired result, which is the value of

>>> import json
>>> foo_brackets_json=json.loads(foo_brackets)
>>> foo_brackets_json['KEY2']['KEY2b']
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: list indices must be integers, not str

I could just try to remove the brackets, but there actually is a value that should be a list,
. You can see this if I strip out all the brackets and try to convert to JSON:

>>> foo_no_brackets='{"KEY2":{"KEY2a":{"KEY2a1":"4","KEY2a2":"5"},{"KEY2a1":"6","KEY2a2":"7"},"KEY2b":"8"},"KEY3":"9"}'
>>> json.loads(foo_no_brackets)
# Traceback omitted since it's just the python error
ValueError: Expecting property name: line 1 column 45 (char 45)

does appear to be valid JSON (I tested here, with the quotes removed) and got the following:



Using a combination of python or bash, is there a way for me to read objects like
so that I can call

I mention bash, because since I'm actually piping the contents of a large
file to a python script for analysis, something like this would work for me:
zcat my_data.gz | (clean up JSON here) |
then that would work.


foo_brackets_json['KEY2'] references a list, here with one element.

You'll have to use integer indices to reference the dictionaries contained in that list:


Don't try to remove the brackets; there could be 0 or more nested dictionaries here. You'll have to determine what should happen in those cases where you don't have just 1 nested dictionary.

The above hardcoded reference assumes there is always at least one such a dictionary in the list, and doesn't care if there are more than one.

You could use looping to handle the 0 or more case:

for nested in foo_brackets_json['KEY2']:

Now you are handling each nested dictionary, one by one. This'll work for the empty list case, and if there is more than one.

You could make having 0 or more than one an error:

if len(foo_brackets_json['KEY2']) != 1:
    raise ValueError('Unexpected number of results')

etc. etc. It all depends on your actual use-case.