s.matthew.english s.matthew.english - 23 days ago 10
JSON Question

speed up execution time of json-ification and processing of data with python

My script looks like this:

with open('toy.json', 'rb') as inpt:

lines = [json.loads(line) for line in inpt]

for line in lines:
records = [item['hash'] for item in lines]
for item in records:
print item


What it does is read in data where each line is valid JSON, but the file as a whole is not valid JSON. The reason for that is because it's an aggregated dump from a web service.

The data looks, more or less, like this:

{"record":"value0","block":"0x79"}
{"record":"value1","block":"0x80"}


So the code above works, it allows me to interact with the data as JSON, but it's so slow that it's essentially useless.

Is there a good way to speed up this process?

EDIT:

with open('toy.json', 'rb') as inpt:

for line in inpt:
print("identifier: "+json.loads(line)['identifier'])
print("value: "+json.loads(line)['value'])


EDIT II:

for line in inpt:
resource = json.loads(line)
print(resource['identifier']+", "+resource['value'])

Answer Source

You write:

for line in lines: 
    records = [item['hash'] for item in lines]

But this means that you will construct that records list n times (with n the number of lines). This is useless, and makes the time complexity O(n2).

You can speed this up with:

with open('toy.json', 'rb') as inpt:

    for item in [json.loads(line)['hash'] for line in inpt]:
        print item

Or you can reduce the memory burden, by each time priting the hash when you process a line:

with open('toy.json', 'rb') as inpt:

    for line in inpt:
        print json.loads(line)['hash']