I read carefully about the unicode pain article days ago and asked this question hours ago:
Do I have to encode unicode variable before write to file?
But lately a strange question came into my mind.
I found out that these codes work fine:
chinese = ['中文', '你好'] # py2, these are bytes, type is str
with open('filename', 'wb') as f:
Since I can declare a variable directly with any unicode characters [...]
But that's not what you've done. They may look like characters, but they are encoded as bytes in the source file. If you try to do anything actually useful with the values, e.g. slice, subscript, take the length of, then everything breaks down. That is the "Unicode pain".
>>> '中文' '\xb8'