I am trying to load a text file, which contains some German letters with
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 26: ordinal not in range(128)
ASCII is limited to characters in the range of [0,128). If you try to decode a byte that is outside that range, one gets that error.
When you read the string in as bytes, you're "widening" the acceptable range of character to [0,256). So your \0xc3 character
Ã is now read in without error. But despite it seeming to work, it's still not "correct".
If your strings are indeed unicode encoded, then the possibility exists that one will contain a multibyte character, that is, a character whose byte representation actually spans multiple bytes.
It is in this case where the difference between reading a file as a byte string and properly decoding it will be quite apparent.
A character like this: č
Will be read in as two bytes, but properly decoded, will be one character:
bytes = bytes('č', encoding='utf-8') print(len(bytes)) # 2 print(len(bytes.decode('utf-8'))) # 1