Carps Carps - 1 year ago 121
Python Question

How to decode JSON and faithfully represent backslashes in Python using the json library

I have a config file which I'm trying to pull apart and then reassemble with updated sections. The config file is in a json format, and I'm trying to extract components of it out to update before inserting back into another json file.

The problem I'm finding is that sections of the JSON file use

which when decoded using the JSON library for python I get
coming out. I need to faithfully represent the original JSON once I insert the updated values back into the new JSON file, hence I need the missing

I suspect the
is getting interpreted as an escape and being dropped by the JSON decoder.

Below is a sample of my efforts so far:

JSON string example:

{"Markup\/0.xaml":"text\/xml; charset=utf-8; format=xml; clrtype=ESRI.ArcGIS.Client.Graphic","Markup\/1.xaml":"text\/xml; charset=utf-8; format=xml; clrtype=ESRI.ArcGIS.Client.Graphic"}

Python Code:

import json
with open(full_json_path_old, 'r+') as fo:
data = json.load(fo)
print "DECODED STRING - ", data

Result of the Print:

u'Markup/0.xaml': u'text/xml; charset=utf-8; format=xml; clrtype=ESRI.ArcGIS.Client.Graphic',
u'Markup/1.xaml': u'text/xml; charset=utf-8; format=xml; clrtype=ESRI.ArcGIS.Client.Graphic'

Answer Source

Yes, the \ backslash is an escape character, and a proper JSON decoder will honour such a character as an escape. The Python JSON decoder is no exception. See section 7 of RFC 7159:

Any character may be escaped.


char = unescaped /
    escape (
        %x22 /          ; "    quotation mark  U+0022
        %x5C /          ; \    reverse solidus U+005C
        %x2F /          ; /    solidus         U+002F
        %x62 /          ; b    backspace       U+0008
        %x66 /          ; f    form feed       U+000C
        %x6E /          ; n    line feed       U+000A
        %x72 /          ; r    carriage return U+000D
        %x74 /          ; t    tab             U+0009
        %x75 4HEXDIG )  ; uXXXX                U+XXXX

escape = %x5C              ; \

(so the \/ sequence is the escape solidus sequence).

Your output is correct; in that I would expect mimetype components like text and xml to be delimited by a forward slash, not by \/.

A forward slash (solidus in the standard) does not have to be escaped however. The same section 7 states what characters must be escaped:

All Unicode characters may be placed within the quotation marks, except for the characters that must be escaped: quotation mark, reverse solidus, and the control characters (U+0000 through U+001F).

As such, the Python JSON encoder won't escape a forward slash when producing JSON output.