Johannes Schwaninger Johannes Schwaninger - 7 months ago 28
Python Question

Python list items encoding

Why is it, that the encoding changes in Python 2.7 when I iterate over the items of a list?

test_list = ['Hafst\xc3\xa4tter', 'asbds@ages.at']


Printing the list:

print(test_list)


gets me this output:

['Hafst\xc3\xa4tter', 'asbds@ages.at']


So far, so good. But why is it, that when I iterate over the list, such as:

for item in test_list:
print(item)


I get this output:

Hafstätter
asbds@ages.at


Why does the encoding change (does it?? And how can I change the encoding within the list?

Answer

The encoding isn't changing, they are just different ways of displaying a string. One shows the non-ASCII bytes as escape codes for debugging:

>>> test_list = ['Hafst\xc3\xa4tter', 'asbds@ages.at']
>>> print(test_list)
['Hafst\xc3\xa4tter', 'asbds@ages.at']
>>> for item in test_list:
...     print(item)
...     
Hafstätter
asbds@ages.at

But they are equivalent:

>>> 'Hafst\xc3\xa4tter' == 'Hafstätter'
True

If you want to see lists displayed with the non-debugging output, you have to generate it yourself:

>>> print("['"+"', '".join(item for item in test_list) + "']")
['Hafstätter', 'asbds@ages.at']

There is a reason for the debugging output:

>>> a = 'a\xcc\x88'
>>> b = '\xc3\xa4'
>>> a
'a\xcc\x88'
>>> print a,b   # should look the same, if not it is the browser's fault :)
ä ä
>>> a==b
False
>>> [a,b]      # In a list you can see the difference by default.
['a\xcc\x88', '\xc3\xa4']
Comments