Saba Saba - 2 months ago 13
JSON Question

parse only first level of json

I have this kind of json file:

{
"params": {
"apiKey": "key",
"sessionId": "123433890",
"lang": "en",
"timezone": "America/New_York",
"query": "hi all",
"latitude": "37.459157",
"longitude": "-122.17926",
"context": "[{"
name ": "
weather ","
lifespan ": 4}]"
}


}

It is not valid json because of

"context": "[{"
name ": "
weather ","
lifespan ": 4}]"


and I can not decode it with json_decode.

So I wonder if it possible to decode only first keys. So the result would possibly look like

array(1) {
'parameters' =>
array(8) {
'apiKey' =>
string(32) "key"
'sessionId' =>
string(10) "123433890"
'lang' =>
string(2) "en"
'timezone' =>
string(16) "America/New_York"
'query' =>
string(16) "hi all"
'latitude' =>
string(9) "37.459157"
'longitude' =>
string(10) "-122.17926"
'context' =>
string(16) "[{"name ": "weather ","lifespan ": 4}]"
}
}


Thank you!

Also this is valid json, but it can not be decoded with json_decode.

{
"query": [
"and for tomorrow"
],
"contexts": "[{'name':'weather', 'lifespan' : 4}]",
"location": {
"latitude": 37.459157,
"longitude": -122.17926
},
"timezone": "America/New_York",
"lang": "en",
"sessionId": "1234567890"
}

Answer

Your JSON is indeed not valid. It should look like this:

{
  "params": {
    "apiKey": "key",
    "sessionId": "123433890",
    "lang": "en",
    "timezone": "America/New_York",
    "query": "hi all",
    "latitude": "37.459157",
    "longitude": "-122.17926",
    "context": [{"name":"weather","lifespan": 4}]
  }
}

The error is that the context key value was put in quotes, while it should not have been, since it is not a string, but a nested object.

If you have no control over the file, and cannot fix it, then you could use this code, which will try to fix it for you after you have read it:

// Invalid JSON as read from your file:
$json = '{
  "params": {
    "apiKey": "key",
    "sessionId": "123433890",
    "lang": "en",
    "timezone": "America/New_York",
    "query": "hi all",
    "latitude": "37.459157",
    "longitude": "-122.17926",
    "context": "[{"
     name ": "
     weather ","
     lifespan ": 4}]"
  }
}';
// Fix malformed JSON
$json = preg_replace_callback('~"([\[{].*?[}\]])"~s', function ($match) {
    return preg_replace('~\s*"\s*~', "\"", $match[1]);
}, $json);
// Now you can do:
$arr = json_decode($json, true);

The result of the above code is that $arr will contain this:

array (
  'params' => array (
    'apiKey' => 'key',
    'sessionId' => '123433890',
    'lang' => 'en',
    'timezone' => 'America/New_York',
    'query' => 'hi all',
    'latitude' => '37.459157',
    'longitude' => '-122.17926',
    'context' => array (
      array (
        'name' => 'weather',
        'lifespan' => 4,
      ),
    ),
  ),
)

See it run on eval.in.

Note how also the context property has structured information (an array).

Explanation of the code

First the following pattern is searched for:

~"([\[{].*?[}\]])"~s

The ~ are just delimiters for the regular expression. Then:

  • ": matches a double quote
  • ( ... ): defines the part that we want to actually get: we want to remove the outer most double quotes, so they are not within these parentheses.
  • [\[{]: matches either one of these literal characters: [{
  • .*?: matches any character, but not more than necessary to continue (the ? makes it non-greedy, i.e. lazy).
  • [}\]]: matches either one of these literal characters: }]
  • s: this is a modifier that will make the . also match with newline characters

For every match, preg_replace_callback will call the function we pass as second argument, passing it an array. The first element of the array will be the complete match, while the second will represent the captured part, i.e. the part between parentheses (that one has our interest):

$match[1]

We apply a new regular expression on that, which removes all white-space around double quotes, including newlines. This way, the key names, like name will be tightly wrapped in double quotes, as it should be:

~\s*"\s*~s

Again, the ~ are just delimiters for the regular expression.

  • \s*: matches any number of white-space, including newlines

The string that is so modified must be returned to the outer preg_replace_callback function, which will use it to insert it in the final result string.

Fixing the Real Cause

Of course, if you do have control over the file, or how it is generated, then fix the cause of this issue.

Note that valid JSON does not use single quotes to delimit strings. They must be double quotes.

Comments