Component 10 Component 10 - 24 days ago 7
JSON Question

Can't create JSON doc from dict with string containing line feed chars

I'm creating a JSON structure which I ultimately need to save to a file but am having problems with embedded line feed characters.

I first create a dictionary:

changes = {
"20161101": "Added logging",
"20161027": "Fixed scrolling bug",
"20161024": "Added summary functionality"
}


and then convert it to a single line-feed separated string:

changes_str = '\n'.join([ "{0} - {1}".format(x, y) for x, y in changes.items() ])
print changes_str
'20161101 - Added logging\n20161027 - Fixed scrolling bug\n20161024 - Added summary functionality'


So far, so good. Now I add it into string (which in reality would come from a text template):

changes_str_json_str = '{ "version": 1.1, "changes": "' + changes_str + '" }'
print changes_str_json_str
'{ "version": 1.1, "changes": 20161101 - Added logging\n20161027 - Fixed scrolling bug\n20161024 - Added summary functionality }'


but when I come to create / encode a JSON object from this using loads, I hit problems:

json_obj = json.loads(changes_str_json_str)

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/python2.7/json/__init__.py", line 339, in loads
return _default_decoder.decode(s)
File "/opt/python2.7/json/decoder.py", line 364, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/opt/python2.7/json/decoder.py", line 380, in raw_decode
obj, end = self.scan_once(s, idx)
ValueError: Invalid control character at: line 1 column 55 (char 54)


Changing the line feed to another character does fix the problem so clearly that's where the problem lies, however, I do need the character to be a line feed as ultimately the data in the file needs to be formatted like this (the file is passed on to another system over which I have no control. Also, as far as I know, line feed is a supported character in JSON strings.

What exactly is the problem here and how can I work around it?

Answer

In JSON you need to properly escape the control characters including \n. Here's example on what's currently happening:

>>> import json
>>> json.loads('"foo\nbar"')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\python35\lib\json\__init__.py", line 319, in loads
    return _default_decoder.decode(s)
  File "C:\python35\lib\json\decoder.py", line 339, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "C:\python35\lib\json\decoder.py", line 355, in raw_decode
    obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Invalid control character at: line 1 column 5 (char 4)

If you properly escape the newline character with backslash it will work as expected:

>>> json.loads('"foo\\nbar"')
'foo\nbar'

So you could fix your code by doing following:

changes_str = '\\n'.join([ "{0} - {1}".format(x, y) for x, y in changes.items() ])

The better alternative would be to first construct the object you want to output and then use dumps so you wouldn't have to worry about escaping at all:

obj = {
    'version': 1.1,
    'changes': changes_str
}
changes_str_json_str = json.dumps(obj)