Black Black - 4 months ago 15
JSON Question

Python: validate and format JSON files

I have around 2000 JSON files which I'm trying to run through a Python program. A problem occurs when a JSON file is not in the correct format. (Error:

ValueError: No JSON object could be decoded
) In turn, I can't read it into my program.

I am currently doing something like the below:

for files in folder:
with open(files) as f:
data = json.load(f); # It causes an error at this part

I know there's offline methods to validating and formatting JSON files but is there a programmatic way to check and format these files? If not, is there a free/cheap alternative to fixing all of these files offline i.e. I just run the program on the folder containing all the JSON files and it formats them as required?

SOLVED using @reece's comment:

invalid_json_files = []
read_json_files = []
def parse():
for files in os.listdir(os.getcwd()):
with open(files) as json_file:
except ValueError, e:
print ("JSON object issue: %s") % e
print invalid_json_files, len(read_json_files)

Turns out that I was saving a file which is not in JSON format in my working directory which was the same place I was reading data from. Thanks for the helpful suggestions.


The built-in JSON module can be used as a validator:

import json

def parse(text):
        return json.loads(text)
    except ValueError as e:
        print('invalid json: %s' % e)
        return None # or: raise

You can make it work with files by using:

with open(filename) as f:
    return json.load(f)

instead of json.loads and you can include the filename as well in the error message.

On Python 3.3.5, for {test: "foo"}, I get:

invalid json: Expecting property name enclosed in double quotes: line 1 column 2 (char 1)

and on 2.7.6:

invalid json: Expecting property name: line 1 column 2 (char 1)

This is because the correct json is {"test": "foo"}.

When handling the invalid files, it is best to not process them any further. You can build a skipped.txt file listing the files with the error, so they can be checked and fixed by hand.

If possible, you should check the site/program that generated the invalid json files, fix that and then re-generate the json file. Otherwise, you are going to keep having new files that are invalid JSON.

Failing that, you will need to write a custom json parser that fixes common errors. With that, you should be putting the original under source control (or archived), so you can see and check the differences that the automated tool fixes (as a sanity check). Ambiguous cases should be fixed by hand.