Bin Chen Bin Chen - 4 months ago 11x
JSON Question

python: json.dumps can't handle utf-8?

Below is the test program, including a Chinese character:

# -*- coding: utf-8 -*-
import json

j = {"d":"中", "e":"a"}
json = json.dumps(j, encoding="utf-8")

print json

Below is the result, look the json.dumps convert the utf-8 to the original numbers!

{"e": "a", "d": "\u4e2d"}

Why this is broken? Or anything I am wrong?


You should read The complete JSON specification is in the white box on the right.

There is nothing wrong with the generated JSON. Generators are allowed to genereate either UTF-8 strings or plain ASCII strings, where characters are escaped with the \uXXXX notation. In your case, the Python json module decided for escaping, and has the escaped notation \u4e2d.

By the way: Any conforming JSON interpreter will correctly unescape this sequence again and give you back the actual character.