Scripting.FileSystemObject Scripting.FileSystemObject - 3 months ago 8
Python Question

decoding base64 images from rtf

In my rtf document, I want to extract image from string:
The string is like this:

\pard\pard\qc{\*\shppict{\pict\pngblip\picw320\pich192\picwgoal0\pichgoal0
89504e470d0a1a0a0000000d4948445200000140000000c00802000000fa352d9100000e2949444[.....]6c4f0000000049454e44ae426082
}}


questions:
1) is this really base64?

2) How to decode it using below code.?

import base64

imgData = b"base64code00from007aove007string00bcox007idont007know007where007it007starts007and007ends"

with open("imageToSave.png", "wb") as fh:
fh.write(base64.decodestring(imgData))


Full rtf text(which when saved as .rtf shows image) is at:

http://hastebin.com/axabazaroc.tex

Answer

No, that's not Base64-encoded data. It is hexadecimal. From the Wikipedia article on the RTF format:

RTF supports inclusion of JPEG, Portable Network Graphics (PNG), Enhanced Metafile (EMF), Windows Metafile (WMF), Apple PICT, Windows Device-dependent bitmap, Windows Device Independent bitmap and OS/2 Metafile picture types in hexadecimal (the default) or binary format in a RTF file.

The binascii.unhexlify() function will decode that back to binary image data for you; you have a PNG image here:

>>> # data contains the hex data from your link, newlines removed
...
>>> from binascii import unhexlify
>>> r = unhexlify(data)
>>> r[:20]
'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x01@'
>>> from imghdr import test_png
>>> test_png(r, None)
'png'

but of course the \pngblip entry was a clue there. I won't include the image here, it is a rather dull 8-bit 320x192 black rectangle.

Comments