Raffael Edu Raffael Edu - 18 days ago 6
Python Question

Reading the JSON File with multiple objects in Python

I'm a bit idiot in programming and Python. I know that these are a lot of explanations in previous questions about this but I carefully read all of them and I didn't find the solution.

I'm trying to read a JSON file which contains about 1 billion of data like this:

334465|{"color":"33ef","age":"55","gender":"m"}
334477|{"color":"3444","age":"56","gender":"f"}
334477|{"color":"3999","age":"70","gender":"m"}


I was trying hard to overcome that 6 digit numbers at the beginning of each line, but I dont know how can I read multiple JSON objects?
Here is my code but I can't find why it is not working?

import json

T =[]
s = open('simple.json', 'r')
ss = s.read()
for line in ss:
line = ss[7:]
T.append(json.loads(line))
s.close()


And the here is the error that I got:

ValueError: Extra Data: line 3 column 1 - line 5 column 48 (char 42 - 138)


Any suggestion would be very helpful for me!

Answer

You should use readlines() instead of read(), and wrap your JSON parsing in a try/except block. Your lines probably contain a trailing newline character and that would cause an error.

s = open('simple.json', 'r')
for line in s.readlines():
    try:
        j = line.split('|')[-1]
        # Remove \n
        j = j.strip()
        json.loads(j)
    except ValueError:
        # You probably have bad JSON
        continue