dublintech dublintech - 4 months ago 10
Python Question

Base64 encoding in Python 3

Following this python example , I do:

>>> import base64
>>> encoded = base64.b64encode(b'data to be encoded')
>>> encoded
b'ZGF0YSB0byBiZSBlbmNvZGVk'


But, if I leave out the leading
b
and do:

>>> encoded = base64.b64encode('data to be encoded')


I get

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python32\lib\base64.py", line 56, in b64encode
raise TypeError("expected bytes, not %s" % s.__class__.__name__)
TypeError: expected bytes, not str


Why is this?

Answer

base64 encoding takes 8-bit binary byte data and encodes it using only the characters A-Z, a-z and 0-9, so it can be transmitted over channels that does not preserve all 8-bits of data, such as email.

Hence, it wants a string of 8-bit bytes. You create those in Python 3 with the b'' syntax.

If you remove the b, it becomes a string. A string is a sequence of Unicode characters. base64 has no idea what to do with Unicode data, it's not 8-bit. It's not really any bits, in fact. :-)

In your second example:

>>> encoded = base64.b64encode('data to be encoded')

All the characters fit neatly into the ASCII character set, and base64 encoding is therefore actually a bit pointless. You can convert it to ascii instead, with

>>> encoded = 'data to be encoded'.encode('ascii')

Or simpler:

>>> encoded = b'data to be encoded'

Which would be the same thing in this case.