Sebastien Damaye Sebastien Damaye - 1 month ago 16
JSON Question

encode unicode characters to unicode escape sequences

I've a CSV file containing sites along with addresses. I need to work on this file to produce a json file that I will use in Django to load initial data to my database. To do that, I need to convert all special characters from the CSV file to unicode escaped characters.

Here is an example:

Örnsköldsvik;SE;Ornskoldsvik;Ångermanlandsgatan 28 A


It should be converted to:

\u00D6rnsk\u00F6ldsvik;SE;Ornskoldsvik;\u00C5ngermanlandsgatan 28 A


The following site is doing exactly the conversion I'm expecting: http://itpro.cz/juniconv/ but I'de like to find a way to do it from command line (bash) or in python. I've already tried using
iconv
,
uconv
and some python scripts without real success.

What kind of script is running behind the
juniconv
website?

Thank you in avance for any suggestion.

Answer

If you want to get Unicode escapes similar to Java in Python; you could use JSON format:

>>> import json
>>> import sys
>>> s = u'Örnsköldsvik;SE;Ornskoldsvik;Ångermanlandsgatan 28 A'
>>> json.dump(s, sys.stdout)
"\u00d6rnsk\u00f6ldsvik;SE;Ornskoldsvik;\u00c5ngermanlandsgatan 28 A"

There is also, unicode-escape codec but you shouldn't use it: it produces Python-specific escaping (how Python Unicode string literals look like):

>>> print s.encode('unicode-escape')
\xd6rnsk\xf6ldsvik;SE;Ornskoldsvik;\xc5ngermanlandsgatan 28 A