Josh Josh - 3 months ago 26
JSON Question

Unable to parse TAB in JSON files

I am running into a parsing problem when loading JSON files that seem to have the TAB character in them.

When I go to http://jsonlint.com/, and I enter the part with the TAB character:

{
"My_String": "Foo bar. Bar foo."
}


The validator complains with:

Parse error on line 2:
{ "My_String": "Foo bar. Bar foo."
------------------^
Expecting 'STRING', 'NUMBER', 'NULL', 'TRUE', 'FALSE', '{', '['


This is literally a copy/paste of the offending JSON text.

I have tried loading this file with
json
and
simplejson
without success. How can I load this properly? Should I just pre-process the file and replace TAB by
\t
or by a space? Or is there anything that I am missing here?

Update:



Here is also a problematic example in
simplejson
:

foo = '{"My_string": "Foo bar.\t Bar foo."}'
simplejson.loads(foo)

JSONDecodeError: Invalid control character '\t' at: line 1 column 24 (char 23)

Answer

From JSON standard:

Insignificant whitespace is allowed before or after any token. The whitespace characters are: character tabulation (U+0009), line feed (U+000A), carriage return (U+000D), and space (U+0020). Whitespace is not allowed within any token, except that space is allowed in strings.

It means that a literal tab character is not allowed inside a JSON string. You need to escape it as \t (in a .json-file):

{"My_string": "Foo bar.\t Bar foo."}

In addition if json text is provided inside a Python string literal then you need double escape the tab:

foo = '{"My_string": "Foo bar.\\t Bar foo."}' # in a Python source

Or use a Python raw string literal:

foo = r'{"My_string": "Foo bar.\t Bar foo."}' # in a Python source