F. Boudin F. Boudin - 1 month ago 11
Python Question

Python string.letters does not include locale diacritics

I am trying to get the alphabet from python string module depending on a given locale with no success (that is with the diacritics, i.e. éèêà... for French). Here is a minimal example :

import locale, string

locale.setlocale(locale.LC_ALL, 'en_US.UTF-8')
print string.letters
# shows ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz

locale.setlocale(locale.LC_ALL, 'fr_FR.UTF-8')
print string.letters
# also shows ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz


In the python documentation, it is said that string.letters is locale dependent, but it seems that it does not work for me.

What I am doing wrong and is it the right way to obtain a language-dependent alphabet ?

Edit: I just checked the locale
print locale.getlocale()
after setting and it is correctly changed.

Answer

In python 2.7 (there is no string.letters in python 3.x) it works if you set the locale to 'fr_FR' (equivalent to 'fr_FR.ISO8859-1', not 'fr_FR.UTF-8').

>>> import locale, string
>>> locale.setlocale(locale.LC_ALL, 'es_ES')
'es_ES'
>>> string.letters
'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz\xaa\xb5\xba\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff'
>>> locale.setlocale(locale.LC_ALL, 'es_ES.UTF-8')
'es_ES.UTF-8'
>>> string.letters
'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz'

So \xaa is character "ª", \xab "«", \xd1 is "Ñ" and so on. But the encoding representation is indeed broken.

I do highly recommended reading this website: https://pythonhosted.org/kitchen/unicode-frustrations.html