laurt laurt - 2 months ago 20
Python Question

Writing Unicode to file with Python

My problem is, I can output Unicode charaters into my terminal but not into files. Demonstration:

user@ubuntu:~$ python -c 'print u"\u5000"'
倀
user@ubuntu:~$ python -c 'print u"\u5000"' >a.out
Traceback (most recent call last):
File "<string>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u5000' in position 0: ordinal not in range(128)


Output of "locale":

LANG=en_US.UTF-8
LANGUAGE=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=en_US.UTF-8

Answer

Because your terminal is set to use UTF-8, Python knows how to encode a Unicode character when writing directly to the terminal. When writing to the file, however, there is no encoding specified, so Python defaults to ASCII. To write to the file, you need to explicitly specify a byte encoding.

python -c 'print u"\u5000".encode("UTF-8")' >a.out