Draconis Draconis - 1 year ago 93
Javascript Question

Matching JSON with a regular expression

I have a JavaScript file containing many object literals:

// lots of irrelevant code
key1: "string value",
key2: 12345,
key3: "strings which may contain ({ arbitrary characters })"
// more irrelevant code

I need to write some Python code to extract these literals.

My first attempt was a regular expression
. But this fails if the literal contains a "})".

Since I know the objects will be valid JSON (matched quotes, braces, etc) in a valid JavaScript file, is there a more elegant way to extract them?

(In other words, the difficulty is removing all the other JavaScript code I don't care about.)

EDIT: In the end, I used a regular expression for any objects which don't contain sub-objects...


...and tracked open/close braces by hand for anything with nesting.

Answer Source

Why not writing a state machine that reads { and increments a counter on every { and decrements it with every } so when it reaches 0 again, take all the characters in the middle and use the json parser from python to check if it is valid or not? on that way, you can get the benefit of syntactical errors instead of a simple match no match from the regex (remember python is { free so false positives are impossible).