laurt laurt - 1 year ago 96
Python Question

Writing Unicode to file with Python

My problem is, I can output Unicode charaters into my terminal but not into files. Demonstration:

user@ubuntu:~$ python -c 'print u"\u5000"'
倀
user@ubuntu:~$ python -c 'print u"\u5000"' >a.out
Traceback (most recent call last):
File "<string>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u5000' in position 0: ordinal not in range(128)


Output of "locale":

LANG=en_US.UTF-8
LANGUAGE=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=en_US.UTF-8

Answer Source

Because your terminal is set to use UTF-8, Python knows how to encode a Unicode character when writing directly to the terminal. When writing to the file, however, there is no encoding specified, so Python defaults to ASCII. To write to the file, you need to explicitly specify a byte encoding.

python -c 'print u"\u5000".encode("UTF-8")' >a.out