Im trying to understand how the transformation from utf-8 to other encodings work.
In this example:- I have a string that I encode with 'utf-8' and decode with 'iso-8859-16'.
Just trying to understand, how a extra byte is added during transformation?
>>> r_post='Hello Günter'
This is a byte string, with the two bytes used for 'ü' conveniently highlighted. In the UTF-8 encoding, the character 'ü' is represented by the two bytes
Reading those bytes and interpreting them in the ISO-8859-16 encoding gives you the characters "Hello GĂŒnter". In 8859-16, the byte
C3 represents the character 'Ă' and
BC represents the character 'Œ'.
See What Every Programmer Absolutely, Positively Needs To Know About Encodings And Character Sets To Work With Text if you need a more in-depth explanation.