In my rtf document, I want to extract image from string:
The string is like this:
imgData = b"base64code00from007aove007string00bcox007idont007know007where007it007starts007and007ends"
with open("imageToSave.png", "wb") as fh:
No, that's not Base64-encoded data. It is hexadecimal. From the Wikipedia article on the RTF format:
RTF supports inclusion of JPEG, Portable Network Graphics (PNG), Enhanced Metafile (EMF), Windows Metafile (WMF), Apple PICT, Windows Device-dependent bitmap, Windows Device Independent bitmap and OS/2 Metafile picture types in hexadecimal (the default) or binary format in a RTF file.
binascii.unhexlify() function will decode that back to binary image data for you; you have a PNG image here:
>>> # data contains the hex data from your link, newlines removed ... >>> from binascii import unhexlify >>> r = unhexlify(data) >>> r[:20] '\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x01@' >>> from imghdr import test_png >>> test_png(r, None) 'png'
but of course the
\pngblip entry was a clue there. I won't include the image here, it is a rather dull 8-bit 320x192 black rectangle.