Jason S Jason S - 1 month ago 16
JSON Question

JSON and escaping characters

I have a string which gets serialized to JSON in Javascript, and then deserialized to Java.

It looks like if the string contains a degree symbol, then I get a problem.

I could use some help in figuring out who to blame:

  • is it the Spidermonkey 1.8 implementation? (this has a JSON implementation built-in)

  • is it Google gson?

  • is it me for not doing something properly?

Here's what happens in JSDB:


I would have expected
which leads me to believe that Spidermonkey's JSON implementation isn't doing the right thing... except that the JSON homepage's syntax description (is that the spec?) says that a char can be


so maybe it passes the string along as-is without encoding it as \u00f8... in which case I would think the problem is with the gson library.

Can anyone help?

I suppose my workaround is to use either a different JSON library, or manually escape strings myself after calling
-- but if this is a bug then I'd like to file a bug report.


This is not a bug in either implementation. There is no requirement to escape U+00B0. To quote the RFC:

2.5. Strings

The representation of strings is similar to conventions used in the C family of programming languages. A string begins and ends with quotation marks. All Unicode characters may be placed within the quotation marks except for the characters that must be escaped: quotation mark, reverse solidus, and the control characters (U+0000 through U+001F).

Any character may be escaped.

Escaping everything inflates the size of the data (all code points can be represented in four or fewer bytes in all Unicode transformation formats; whereas encoding them all makes them six or twelve bytes).

It is more likely that you have a text transcoding bug somewhere in your code and escaping everything in the ASCII subset masks the problem. It is a requirement of the JSON spec that all data use a Unicode encoding.