VPfB VPfB - 3 months ago 9
JSON Question

How to efficiently decode a large number of small JSON data chunks?

I'm going to write a parser for a log file where each line is one JSON record.

I could decode each line in a loop:

logs = [json.loads(line) for line in lines]

or I could decode the whole file in one go:

logs = json.loads('[' + ','.join(lines) + ']')

I want to minimize the execution time, please disregard other factors. Is there any reason to prefer one approach over the other?


You can easily test it with timeit:

$ python -m timeit -s 'import json; lines = ["{\"foo\":\"bar\"}"] * 1000' '[json.loads(line) for line in lines]'
100 loops, best of 3: 2.22 msec per loop
$ python -m timeit -s 'import json; lines = ["{\"foo\":\"bar\"}"] * 1000' "json.loads('[' + ','.join(lines) + ']')"
1000 loops, best of 3: 839 usec per loop

In this case combining the data and parsing it one go is about 2.5 times faster.