pie3636 pie3636 - 6 months ago 34
Python Question

Python - String changing after decoding and encoding again (zlib + base64)

I have this very simple Python code:

in_data = "eNrtmD1Lw0AY..."
print("Input: " + in_data)
out_data = in_data.decode('base64').decode('zlib').encode('zlib').encode('base64')
print("Output: " + out_data)


It outputs:

Input: eNrtmD1Lw0AY...
Output: eJztmE1LAkEY...


The string is also correctly decoded; if I display
in_data.decode('base64').decode('zlib')
, it gives the expected result.

Also, the formatting is different for both strings:

Weird formatting

Why is the decoding/encoding not working properly? Are there some sort of parameters I should use?

Answer

Your data on input starts with the hex bytes 78 DA, your output starts with 78 9C:

>>> 'eNrt'.decode('base64').encode('hex')[:4]
'78da'
>>> 'eJzt'.decode('base64').encode('hex')[:4]
'789c'

DA is the highest compression level, 9C is the default. See What does a zlib header look like?

Rather than use .encode('zlib') use the zlib.compress() function, an set the level to 9:

import zlib

zlib.compress(decoded_data, 9).encode('base64')

The output of the base64 encoding inserts a newline every 76 characters to make it suitable for MIME encapsulation (emailing). You could use the base64.b64encode() function instead to encode without newlines.