Bender Rodriguez Bender Rodriguez - 18 days ago 5
Python Question

Python. Difference between unicode+variable and u+constant?

Can someone please tell me how to fix this please.

This works:

nOrd = (ord(u'ط'))


But this fails:

s="‎ط"
s=unicode(s, 'utf-8')
nOrd = (ord((s)))


The error I get is:

TypeError: ord() expected a character, but string of length 2 found

Answer

Your second s is simply not the same text as the first example:

>>> u'ط'
u'\u0637'
>>> u'ط'.encode('utf8')
'\xd8\xb7'
>>> s="‎ط"
>>> s
'\xe2\x80\x8e\xd8\xb7'
>>> s.decode('utf8')
u'\u200e\u0637'

You have a U+200E LEFT-TO-RIGHT MARK character in the second example. That makes it two characters, not one.

You could remove it by stripping with str.lstrip() or by using str.replace(); the first only removes it from the start, the other from everywhere in the string:

s = s.lstrip(u'\u200e')
# or
s = s.replace(u'\u200e', u'')