johan855 johan855 - 1 month ago 14
JSON Question

JSON Line issue when loading from using Python

I'm having a hard time trying to load an API response from into a file or a list.

The enpoint I'm using is{0}/json/latest?_apikey={1}

Previously all my scripts were set to use normal JSON and all was working well, but now hey have decided to use json line, but somehow it seems malformed.

The way I tried to adapt my scripts is to read the API response in the following way:

url_call = '{0}/json/latest?_apikey={1}'.format(extractors_row_dict['id'], auth_key)
r = requests.get(url_call)

with open(temporary_json_file_path, 'w') as outfile:
json.dump(r.content, outfile)

data = []
with open(temporary_json_file_path) as f:
for line in f:

the problem doing this is that when I check data[0], all of the json file content was dumped in it...

data[1] = IndexError: list index out of range

Here is an example of


Does anyone have experience with the response of this API?
All other jsonline reads I do from other sources work fine except this one.

EDIT based on comment:

print repr(open(temporary_json_file_path).read(300))

gives this:



The API gave you double-encoded data. Something pushed JSON data into the service, and the service then encoded that data again to a JSON string.

You'd have to decode it again:

with open(temporary_json_file_path) as f:
    for line in f:
        decoded = json.loads(line)
        # attempt to decode again; if it fails there was no double encoding
            decoded = json.loads(decoded)
        except TypeError:

It would be much, much better if this was fixed at the source however. I'm not sure how extractor sets are built; this could be a bug in their code, or a bug in whatever is responsible for scraping sites.