从 rtf 解码 base64 图像
decoding base64 images from rtf
在我的 rtf 文档中,我想从字符串中提取图像:
字符串是这样的:
\pard\pard\qc{\*\shppict{\pict\pngblip\picw320\pich192\picwgoal0\pichgoal0
89504e470d0a1a0a0000000d4948445200000140000000c00802000000fa352d9100000e2949444[.....]6c4f0000000049454e44ae426082
}}
问题:
1) 这真的是 base64 吗?
2) 如何使用下面的代码对其进行解码?
import base64
imgData = b"base64code00from007aove007string00bcox007idont007know007where007it007starts007and007ends"
with open("imageToSave.png", "wb") as fh:
fh.write(base64.decodestring(imgData))
完整的 rtf 文本(保存为 .rtf 时显示图像)位于:
不,那不是 Base64 编码的数据。它是十六进制。来自 Wikipedia article on the RTF format:
RTF supports inclusion of JPEG, Portable Network Graphics (PNG), Enhanced Metafile (EMF), Windows Metafile (WMF), Apple PICT, Windows Device-dependent bitmap, Windows Device Independent bitmap and OS/2 Metafile picture types in hexadecimal (the default) or binary format in a RTF file.
binascii.unhexlify()
function 将为您将其解码回二进制图像数据;你这里有一张 PNG 图片:
>>> # data contains the hex data from your link, newlines removed
...
>>> from binascii import unhexlify
>>> r = unhexlify(data)
>>> r[:20]
'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x01@'
>>> from imghdr import test_png
>>> test_png(r, None)
'png'
但是 \pngblip
条目当然是那里的线索。我不会在这里包含图像,它是一个相当暗淡的 8 位 320x192 黑色矩形。
在我的 rtf 文档中,我想从字符串中提取图像: 字符串是这样的:
\pard\pard\qc{\*\shppict{\pict\pngblip\picw320\pich192\picwgoal0\pichgoal0
89504e470d0a1a0a0000000d4948445200000140000000c00802000000fa352d9100000e2949444[.....]6c4f0000000049454e44ae426082
}}
问题: 1) 这真的是 base64 吗?
2) 如何使用下面的代码对其进行解码?
import base64
imgData = b"base64code00from007aove007string00bcox007idont007know007where007it007starts007and007ends"
with open("imageToSave.png", "wb") as fh:
fh.write(base64.decodestring(imgData))
完整的 rtf 文本(保存为 .rtf 时显示图像)位于:
不,那不是 Base64 编码的数据。它是十六进制。来自 Wikipedia article on the RTF format:
RTF supports inclusion of JPEG, Portable Network Graphics (PNG), Enhanced Metafile (EMF), Windows Metafile (WMF), Apple PICT, Windows Device-dependent bitmap, Windows Device Independent bitmap and OS/2 Metafile picture types in hexadecimal (the default) or binary format in a RTF file.
binascii.unhexlify()
function 将为您将其解码回二进制图像数据;你这里有一张 PNG 图片:
>>> # data contains the hex data from your link, newlines removed
...
>>> from binascii import unhexlify
>>> r = unhexlify(data)
>>> r[:20]
'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x01@'
>>> from imghdr import test_png
>>> test_png(r, None)
'png'
但是 \pngblip
条目当然是那里的线索。我不会在这里包含图像,它是一个相当暗淡的 8 位 320x192 黑色矩形。