supermario supermario - 2 months ago 22
Python Question

How to get the Unicode representation of Arabic strings in Django?

I'm wondering how to get the Unicode representation of Arabic strings like

in Python?

The result should be

I need that so that I can compare texts retrieved from mysql db and data stored in redis cache.


Assuming you have an actual Unicode string, you can do

# -*- coding: utf-8 -*-
s = u'سلام'
print s.encode('unicode-escape')    



The # -*- coding: utf-8 -*- directive is purely to tell the interpreter that the source code is UTF-8 encoded, it has no bearing on how the script itself handles Unicode.

If your script is reading that Arabic string from a UTF-8 encoded source, the bytes will look like this:


You can convert that to Unicode like this:

data = '\xd8\xb3\xd9\x84\xd8\xa7\xd9\x85'
s = data.decode('utf8')
print s
print s.encode('unicode-escape')  



Of course, you do need to make sure that your terminal is set up to handle Unicode properly.

Note that


is a plain (byte) string containing 24 bytes, whereas


is a Unicode string containing 4 Unicode characters.

You may find this article helpful: Pragmatic Unicode, which was written by SO veteran Ned Batchelder.