I'm wondering how to get the Unicode representation of Arabic strings like
Assuming you have an actual Unicode string, you can do
# -*- coding: utf-8 -*- s = u'سلام' print s.encode('unicode-escape')
# -*- coding: utf-8 -*- directive is purely to tell the interpreter that the source code is UTF-8 encoded, it has no bearing on how the script itself handles Unicode.
If your script is reading that Arabic string from a UTF-8 encoded source, the bytes will look like this:
You can convert that to Unicode like this:
data = '\xd8\xb3\xd9\x84\xd8\xa7\xd9\x85' s = data.decode('utf8') print s print s.encode('unicode-escape')
Of course, you do need to make sure that your terminal is set up to handle Unicode properly.
is a plain (byte) string containing 24 bytes, whereas
is a Unicode string containing 4 Unicode characters.
You may find this article helpful: Pragmatic Unicode, which was written by SO veteran Ned Batchelder.