How to get the Unicode representation of Arabic strings in Django?

I'm wondering how to get the Unicode representation of Arabic strings like

in Python?

The result should be

I need that so that I can compare texts retrieved from mysql db and data stored in redis cache.


Assuming you have an actual Unicode string, you can do

# -*- coding: utf-8 -*-
s = u'سلام'
print s.encode('unicode-escape')    



The # -*- coding: utf-8 -*- directive is purely to tell the interpreter that the source code is UTF-8 encoded, it has no bearing on how the script itself handles Unicode.

If your script is reading that Arabic string from a UTF-8 encoded source, the bytes will look like this:


You can convert that to Unicode like this:

data = '\xd8\xb3\xd9\x84\xd8\xa7\xd9\x85'
s = data.decode('utf8')
print s
print s.encode('unicode-escape')  



Of course, you do need to make sure that your terminal is set up to handle Unicode properly.

Note that


is a plain (byte) string containing 24 bytes, whereas


is a Unicode string containing 4 Unicode characters.

You may find this article helpful: Pragmatic Unicode, which was written by SO veteran Ned Batchelder.