从 rtf 解码 base64 图像

decoding base64 images from rtf

在我的 rtf 文档中,我想从字符串中提取图像: 字符串是这样的:

    \pard\pard\qc{\*\shppict{\pict\pngblip\picw320\pich192\picwgoal0\pichgoal0 
    89504e470d0a1a0a0000000d4948445200000140000000c00802000000fa352d9100000e2949444[.....]6c4f0000000049454e44ae426082
}}

问题: 1) 这真的是 base64 吗?

2) 如何使用下面的代码对其进行解码?

import base64

imgData = b"base64code00from007aove007string00bcox007idont007know007where007it007starts007and007ends"

with open("imageToSave.png", "wb") as fh:
    fh.write(base64.decodestring(imgData))

完整的 rtf 文本(保存为 .rtf 时显示图像)位于:

http://hastebin.com/axabazaroc.tex

不,那不是 Base64 编码的数据。它是十六进制。来自 Wikipedia article on the RTF format:

RTF supports inclusion of JPEG, Portable Network Graphics (PNG), Enhanced Metafile (EMF), Windows Metafile (WMF), Apple PICT, Windows Device-dependent bitmap, Windows Device Independent bitmap and OS/2 Metafile picture types in hexadecimal (the default) or binary format in a RTF file.

binascii.unhexlify() function 将为您将其解码回二进制图像数据;你这里有一张 PNG 图片:

>>> # data contains the hex data from your link, newlines removed
...
>>> from binascii import unhexlify
>>> r = unhexlify(data)
>>> r[:20]
'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x01@'
>>> from imghdr import test_png
>>> test_png(r, None)
'png'

但是 \pngblip 条目当然是那里的线索。我不会在这里包含图像,它是一个相当暗淡的 8 位 320x192 黑色矩形。