UnrecognizedImageError - 图片插入错误 - python-docx
UnrecognizedImageError - image insertion error - python-docx
我正在尝试使用 python-docx
将 wmf 文件插入到 docx,这会产生以下回溯。
Traceback (most recent call last):
File "C:/Users/ADMIN/PycharmProjects/ppt-to-word/ppt_reader.py", line 79, in <module>
read_ppt(path, file)
File "C:/Users/ADMIN/PycharmProjects/ppt-to-word/ppt_reader.py", line 73, in read_ppt
write_docx(ppt_data, False)
File "C:/Users/ADMIN/PycharmProjects/ppt-to-word/ppt_reader.py", line 31, in write_docx
document.add_picture(slide_data.get('picture_location'), width=Inches(5.0))
File "C:\Python34\lib\site-packages\docx\document.py", line 72, in add_picture
return run.add_picture(image_path_or_stream, width, height)
File "C:\Python34\lib\site-packages\docx\text\run.py", line 62, in add_picture
inline = self.part.new_pic_inline(image_path_or_stream, width, height)
File "C:\Python34\lib\site-packages\docx\parts\story.py", line 56, in new_pic_inline
rId, image = self.get_or_add_image(image_descriptor)
File "C:\Python34\lib\site-packages\docx\parts\story.py", line 29, in get_or_add_image
image_part = self._package.get_or_add_image_part(image_descriptor)
File "C:\Python34\lib\site-packages\docx\package.py", line 31, in get_or_add_image_part
return self.image_parts.get_or_add_image_part(image_descriptor)
File "C:\Python34\lib\site-packages\docx\package.py", line 74, in get_or_add_image_part
image = Image.from_file(image_descriptor)
File "C:\Python34\lib\site-packages\docx\image\image.py", line 55, in from_file
return cls._from_stream(stream, blob, filename)
File "C:\Python34\lib\site-packages\docx\image\image.py", line 176, in _from_stream
image_header = _ImageHeaderFactory(stream)
File "C:\Python34\lib\site-packages\docx\image\image.py", line 199, in _ImageHeaderFactory
raise UnrecognizedImageError
docx.image.exceptions.UnrecognizedImageError
图像文件是 .wmf
格式。
感谢任何帮助或建议。
python-docx
通过 "recognizing" 其独特的 header 来识别 image-file 的类型。通过这种方式,它可以区分 JPEG 与 PNG、TIFF 等。这比映射文件扩展名可靠得多,也比要求用户告诉您类型方便得多。这是一种很常见的方法。
此错误表明 python-docx
没有找到它识别的 header。 Windows 图元文件格式 (WMF) 可能会很棘手,专有规范和现场文件样本的变化有很大余地。
要解决此问题,我建议您使用 确实 识别它的内容阅读文件(我将从 Pillow 开始)并将其 "convert" 放入相同的文件中或其他格式,希望在此过程中更正 header。
首先,我会尝试只读取它并将其保存为 WMF(如果可以的话,也可能是 EMF)。这可能足以解决问题。如果您必须更改为中间格式然后再返回,那可能会造成损失,但总比没有好。
ImageMagick 可能是另一个值得尝试的好选择,因为它的覆盖范围可能比 Pillow 更好。
说明
python-docx/image.py will read differernt picture file format from SIGNATURES
格式
1.jpg
使用Image converter将1.jpg转换为不同的文件格式。
使用 magic
获取 mime 类型。
File format
Mime type
add_picture()
.jpg
image/jpeg
√
.png
image/png
√
.jfif
image/jpeg
√
.exif
√
.gif
image/gif
√
.tiff
image/tiff
√
.bmp
image/x-ms-bmp
√
.eps
application/postscript
×
.hdr
application/octet-stream
×
.ico
image/x-icon
×
.svg
image/svg+xml
×
.tga
image/x-tga
×
.wbmp
application/octet-stream
×
.webp
image/webp
×
如何解决
计划A
将其他格式转换为支持的格式,如 .jpg
安装
pip install pillow
代码
from pathlib import Path
from PIL import Image
def image_to_jpg(image_path):
path = Path(image_path)
if path.suffix not in {'.jpg', '.png', '.jfif', '.exif', '.gif', '.tiff', '.bmp'}:
jpg_image_path = f'{path.parent / path.stem}_result.jpg'
Image.open(image_path).convert('RGB').save(jpg_image_path)
return jpg_image_path
return image_path
if __name__ == '__main__':
from docx import Document
document = Document()
document.add_picture(image_to_jpg('1.jpg'))
document.add_picture(image_to_jpg('1.webp'))
document.save('test.docx')
B计划
首先,尝试手动将图片添加到Word中。如果成功,说明Word支持这种格式。然后通过继承 BaseImageHeader
class 并实现 from_stream()
方法并 SIGNATURES
添加图像格式来修改此库。
缺少文件后缀
修改1.jpg为1
from docx import Document
document = Document()
document.add_picture('1')
document.save('test.docx')
它会显示这个
使用这个
from docx import Document
document = Document()
document.add_picture(open('1', mode='rb'))
document.save('test.docx')
结论
import io
from pathlib import Path
import magic
from PIL import Image
def image_to_jpg(image_path_or_stream):
f = io.BytesIO()
if isinstance(image_path_or_stream, str):
path = Path(image_path_or_stream)
if path.suffix in {'.jpg', '.png', '.jfif', '.exif', '.gif', '.tiff', '.bmp'}:
f = open(image_path_or_stream, mode='rb')
else:
Image.open(image_path_or_stream).convert('RGB').save(f, format='JPEG')
else:
buffer = image_path_or_stream.read()
mime_type = magic.from_buffer(buffer, mime=True)
if mime_type in {'image/jpeg', 'image/png', 'image/gif', 'image/tiff', 'image/x-ms-bmp'}:
f = image_path_or_stream
else:
Image.open(io.BytesIO(buffer)).convert('RGB').save(f, format='JPEG')
return f
if __name__ == '__main__':
from docx import Document
document = Document()
document.add_picture(image_to_jpg('1.jpg'))
document.add_picture(image_to_jpg('1.webp'))
document.add_picture(image_to_jpg(open('1.jpg', mode='rb')))
document.add_picture(image_to_jpg(open('1', mode='rb'))) # copy 1.webp and rename it to 1
document.save('test.docx')
我正在尝试使用 python-docx
将 wmf 文件插入到 docx,这会产生以下回溯。
Traceback (most recent call last):
File "C:/Users/ADMIN/PycharmProjects/ppt-to-word/ppt_reader.py", line 79, in <module>
read_ppt(path, file)
File "C:/Users/ADMIN/PycharmProjects/ppt-to-word/ppt_reader.py", line 73, in read_ppt
write_docx(ppt_data, False)
File "C:/Users/ADMIN/PycharmProjects/ppt-to-word/ppt_reader.py", line 31, in write_docx
document.add_picture(slide_data.get('picture_location'), width=Inches(5.0))
File "C:\Python34\lib\site-packages\docx\document.py", line 72, in add_picture
return run.add_picture(image_path_or_stream, width, height)
File "C:\Python34\lib\site-packages\docx\text\run.py", line 62, in add_picture
inline = self.part.new_pic_inline(image_path_or_stream, width, height)
File "C:\Python34\lib\site-packages\docx\parts\story.py", line 56, in new_pic_inline
rId, image = self.get_or_add_image(image_descriptor)
File "C:\Python34\lib\site-packages\docx\parts\story.py", line 29, in get_or_add_image
image_part = self._package.get_or_add_image_part(image_descriptor)
File "C:\Python34\lib\site-packages\docx\package.py", line 31, in get_or_add_image_part
return self.image_parts.get_or_add_image_part(image_descriptor)
File "C:\Python34\lib\site-packages\docx\package.py", line 74, in get_or_add_image_part
image = Image.from_file(image_descriptor)
File "C:\Python34\lib\site-packages\docx\image\image.py", line 55, in from_file
return cls._from_stream(stream, blob, filename)
File "C:\Python34\lib\site-packages\docx\image\image.py", line 176, in _from_stream
image_header = _ImageHeaderFactory(stream)
File "C:\Python34\lib\site-packages\docx\image\image.py", line 199, in _ImageHeaderFactory
raise UnrecognizedImageError
docx.image.exceptions.UnrecognizedImageError
图像文件是 .wmf
格式。
感谢任何帮助或建议。
python-docx
通过 "recognizing" 其独特的 header 来识别 image-file 的类型。通过这种方式,它可以区分 JPEG 与 PNG、TIFF 等。这比映射文件扩展名可靠得多,也比要求用户告诉您类型方便得多。这是一种很常见的方法。
此错误表明 python-docx
没有找到它识别的 header。 Windows 图元文件格式 (WMF) 可能会很棘手,专有规范和现场文件样本的变化有很大余地。
要解决此问题,我建议您使用 确实 识别它的内容阅读文件(我将从 Pillow 开始)并将其 "convert" 放入相同的文件中或其他格式,希望在此过程中更正 header。
首先,我会尝试只读取它并将其保存为 WMF(如果可以的话,也可能是 EMF)。这可能足以解决问题。如果您必须更改为中间格式然后再返回,那可能会造成损失,但总比没有好。
ImageMagick 可能是另一个值得尝试的好选择,因为它的覆盖范围可能比 Pillow 更好。
说明
python-docx/image.py will read differernt picture file format from SIGNATURES
格式
1.jpg
使用Image converter将1.jpg转换为不同的文件格式。
使用 magic
获取 mime 类型。
File format | Mime type | add_picture() |
---|---|---|
.jpg | image/jpeg | √ |
.png | image/png | √ |
.jfif | image/jpeg | √ |
.exif | √ | |
.gif | image/gif | √ |
.tiff | image/tiff | √ |
.bmp | image/x-ms-bmp | √ |
.eps | application/postscript | × |
.hdr | application/octet-stream | × |
.ico | image/x-icon | × |
.svg | image/svg+xml | × |
.tga | image/x-tga | × |
.wbmp | application/octet-stream | × |
.webp | image/webp | × |
如何解决
计划A
将其他格式转换为支持的格式,如 .jpg
安装
pip install pillow
代码
from pathlib import Path
from PIL import Image
def image_to_jpg(image_path):
path = Path(image_path)
if path.suffix not in {'.jpg', '.png', '.jfif', '.exif', '.gif', '.tiff', '.bmp'}:
jpg_image_path = f'{path.parent / path.stem}_result.jpg'
Image.open(image_path).convert('RGB').save(jpg_image_path)
return jpg_image_path
return image_path
if __name__ == '__main__':
from docx import Document
document = Document()
document.add_picture(image_to_jpg('1.jpg'))
document.add_picture(image_to_jpg('1.webp'))
document.save('test.docx')
B计划
首先,尝试手动将图片添加到Word中。如果成功,说明Word支持这种格式。然后通过继承 BaseImageHeader
class 并实现 from_stream()
方法并 SIGNATURES
添加图像格式来修改此库。
缺少文件后缀
修改1.jpg为1
from docx import Document
document = Document()
document.add_picture('1')
document.save('test.docx')
它会显示这个
使用这个
from docx import Document
document = Document()
document.add_picture(open('1', mode='rb'))
document.save('test.docx')
结论
import io
from pathlib import Path
import magic
from PIL import Image
def image_to_jpg(image_path_or_stream):
f = io.BytesIO()
if isinstance(image_path_or_stream, str):
path = Path(image_path_or_stream)
if path.suffix in {'.jpg', '.png', '.jfif', '.exif', '.gif', '.tiff', '.bmp'}:
f = open(image_path_or_stream, mode='rb')
else:
Image.open(image_path_or_stream).convert('RGB').save(f, format='JPEG')
else:
buffer = image_path_or_stream.read()
mime_type = magic.from_buffer(buffer, mime=True)
if mime_type in {'image/jpeg', 'image/png', 'image/gif', 'image/tiff', 'image/x-ms-bmp'}:
f = image_path_or_stream
else:
Image.open(io.BytesIO(buffer)).convert('RGB').save(f, format='JPEG')
return f
if __name__ == '__main__':
from docx import Document
document = Document()
document.add_picture(image_to_jpg('1.jpg'))
document.add_picture(image_to_jpg('1.webp'))
document.add_picture(image_to_jpg(open('1.jpg', mode='rb')))
document.add_picture(image_to_jpg(open('1', mode='rb'))) # copy 1.webp and rename it to 1
document.save('test.docx')