I have a function like this:
persian_numbers = '۱۲۳۴۵۶۷۸۹۰'
english_numbers = '1234567890'
arabic_numbers = '١٢٣٤٥٦٧٨٩٠'
english_trans = string.maketrans(english_numbers, persian_numbers)
arabic_trans = string.maketrans(arabic_numbers, persian_numbers)
english_translate = string.maketrans(english_numbers, persian_numbers)
ValueError: maketrans arguments must have same length
Unicode objects can interpret these digits (arabic and persian) as actual digits - no need to translate them by using character substitution.
EDIT - I came out with a way to make your replacement using Python2 regular expressions:
# coding: utf-8 import re # Attention: while the characters for the strings bellow are # dislplayed indentically, inside they are represented # by distinct unicode codepoints persian_numbers = u'۱۲۳۴۵۶۷۸۹۰' arabic_numbers = u'١٢٣٤٥٦٧٨٩٠' english_numbers = u'1234567890' persian_regexp = u"(%s)" % u"|".join(persian_numbers) arabic_regexp = u"(%s)" % u"|".join(arabic_numbers) def _sub(match_object, digits): return english_numbers[digits.find(match_object.group(0))] def _sub_arabic(match_object): return _sub(match_object, arabic_numbers) def _sub_persian(match_object): return _sub(match_object, persian_numbers) def replace_arabic(text): return re.sub(arabic_regexp, _sub_arabic, text) def replace_persian(text): return re.sub(arabic_regexp, _sub_persian, text)
Attempt that the "text" parameter must be unicode itself.
(also this code could be shortened by using lambdas and combining some expressions in a single line, but there is no point in doing so, but for loosing readability)
It should work to you up to here, but please read on the original answer I had posted
-- original answer
So, if you instantiate your variables as unicode (prepending an u to the quote char), they are correctly understood in Python:
>>> persian_numbers = u'۱۲۳۴۵۶۷۸۹۰' >>> english_numbers = u'1234567890' >>> arabic_numbers = u'١٢٣٤٥٦٧٨٩٠' >>> >>> print int(persian_numbers) 1234567890 >>> print int(english_numbers) 1234567890 >>> print int(arabic_numbers) 1234567890 >>> persian_numbers.isdigit() True >>>
By the way, the "maketrans" method does not exist for unicode objects (in Python2 - see the comments).
It is very important to understand the basics about unicode - for everyone, even people writing English only programs who think they will never deal with any char out of the 26 latin letters. When writing code that will deal with different chars it is vital - the program can't possibly work without you knowing what you are doing except by chance.
A very good article to read is http://www.joelonsoftware.com/articles/Unicode.html - please read it now. You can keep in mind, while reading it, that Python allows one to translate unicode characters to a string in any "physical" encoding by using the "encode" method of unicode objects.
>>> arabic_numbers = u'١٢٣٤٥٦٧٨٩٠' >>> len(arabic_numbers) 10 >>> enc_arabic = arabic_numbers.encode("utf-8") >>> print enc_arabic ١٢٣٤٥٦٧٨٩٠ >>> len(enc_arabic) 20 >>> int(enc_arabic) Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: invalid literal for int() with base 10: '\xd9\xa1\xd9\xa2\xd9\xa3\xd9\xa4\xd9\xa5\xd9\xa6\xd9\xa7\xd9\xa8\xd9\xa9\xd9\xa0'
Thus, the characters loose their sense as "single entities" and as digits when encoding - the encoded object (str type in Python 2.x) is justa strrng of bytes - which nonetheless is needed when sending these characters to any output from the program - be it console, GUI Window, database, html code, etc...