James James - 16 days ago 10
Python Question

Base64 Incorrect padding error using Python

I am trying to decode Base64 into Hex for about 200 Base64 data and I am getting this following error. It does decoding for 60 of them then stops.

ABHvPdSaxrhjAWA=
0011ef3dd49ac6b8630160
ABHPdSaxrhjAWA=
Traceback (most recent call last):
File "tt.py", line 36, in <module>
csvlines[0] = csvlines[0].decode("base64").encode("hex")
File "C:\Python27\lib\encodings\base64_codec.py", line 43, in base64_decode
output = base64.decodestring(input)
File "C:\Python27\lib\base64.py", line 325, in decodestring
return binascii.a2b_base64(s)
binascii.Error: Incorrect padding


Some original Base64 source from CSV

ABHPdSaxrhjAWA=
ABDPdSaxrhjAWA=
ABDPdSaxrhjAWA=
ABDPdSaxrhjAWA=
ABDPdSaxrhjAWA=
ABDPdSaxrhjAWA=
ABDPdS4xriiAVQ=
ABDPdSqxrizAU4=
ABDPdSrxrjPAUo=

Answer

You have at least one string in your CSV file that is either not a Base64 string, is a corrupted (damaged) Base64 string, or is a string that is missing the required = padding. Your example value, ABHPdSaxrhjAWA=, is short one = or is missing another data character.

Base64 strings, properly padded, have a length that is a multiple of 4, so you can easily re-add the padding:

value = csvlines[0]
if len(value) % 4:
    # not a multiple of 4, add padding:
    value += '=' * (4 - len(value) % 4) 
csvlines[0] = value.decode("base64").encode("hex")

If the value then still fails to decode, then your input was corrupted or not valid Base64 to begin with.

For the example error, ABHPdSaxrhjAWA=, the above adds one = to make it decodable:

>>> value = 'ABHPdSaxrhjAWA='
>>> if len(value) % 4:
...     # not a multiple of 4, add padding:
...     value += '=' * (4 - len(value) % 4)
...
>>> value
'ABHPdSaxrhjAWA=='
>>> value.decode('base64')
'\x00\x11\xcfu&\xb1\xae\x18\xc0X'
>>> value.decode('base64').encode('hex')
'0011cf7526b1ae18c058'

I need to emphasise that your data may simply be corrupted. Your console output includes one value that worked, and one that failed. The one that worked is one character longer, and that's the only difference:

ABHvPdSaxrhjAWA=
ABHPdSaxrhjAWA=

Note the v in the 4th place; this is missing from the second example. This could indicate that something happened to your CSV data that caused that character to be dropped from the second example. Adding in padding can make the second value decodable again, but the result would be wrong. We can't tell you which of those two options is the cause here.