user1050619 user1050619 - 2 months ago 16
Python Question

Encoding utf-8 and decode to iso-8859-16

Im trying to understand how the transformation from utf-8 to other encodings work.

In this example:- I have a string that I encode with 'utf-8' and decode with 'iso-8859-16'.
Just trying to understand, how a extra byte is added during transformation?

>>> r_post='Hello Günter'
>>> r_post=r_post.encode('utf-8')
>>> r_post
b'Hello G\xc3\xbcnter'
>>> r_post=r_post.decode('iso-8859-16')
>>> r_post
'Hello GĂŒnter'

Hello G\xc3\xbcnter

This is a byte string, with the two bytes used for 'ü' conveniently highlighted. In the UTF-8 encoding, the character 'ü' is represented by the two bytes C3 BC.

Reading those bytes and interpreting them in the ISO-8859-16 encoding gives you the characters "Hello GĂŒnter". In 8859-16, the byte C3 represents the character 'Ă' and BC represents the character 'Œ'.

See What Every Programmer Absolutely, Positively Needs To Know About Encodings And Character Sets To Work With Text if you need a more in-depth explanation.