从演示文稿文件中提取图像
Extracting images from presentation file
我正在研究 python-pptx 包。对于我的代码,我需要提取演示文件中存在的所有图像。有人可以帮我解决这个问题吗?
在此先感谢您的帮助。
我的代码如下所示:
import pptx
prs = pptx.Presentation(filename)
for slide in prs.slides:
for shape in slide.shapes:
print(shape.shape_type)
在使用 shape_type 时显示 ppt 中存在的 PICTURE(13)。但我希望将图片提取到代码所在的文件夹中。
使用这个PPTExtractor repo作为参考。
ppt = PPTExtractor("some/PowerPointFile")
# found images
len(ppt)
# image list
images = ppt.namelist()
# extract image
ppt.extract(images[0])
# save image with different name
ppt.extract(images[0], "nuevo-nombre.png")
# extract all images
ppt.extractall()
将图像保存在不同的目录中:
ppt.extract("image.png", path="/another/directory")
ppt.extractall(path="/another/directory")
python-pptx
中的 Picture
(形状)对象提供对其显示的图像的访问:
from pptx import Presentation
from pptx.enum.shapes import MSO_SHAPE_TYPE
def iter_picture_shapes(prs):
for slide in prs.slides:
for shape in slide.shapes:
if shape.shape_type == MSO_SHAPE_TYPE.PICTURE:
yield shape
for picture in iter_picture_shapes(Presentation(filename)):
image = picture.image
# ---get image "file" contents---
image_bytes = image.blob
# ---make up a name for the file, e.g. 'image.jpg'---
image_filename = 'image.%s' % image.ext
with open(image_filename, 'wb') as f:
f.write(image_bytes)
生成唯一的文件名留给您作为练习。您需要的所有其他位都在这里。
有关 Image
对象的更多详细信息,请参阅此处的文档:
https://python-pptx.readthedocs.io/en/latest/api/image.html#image-objects
scanny 的解决方案对我不起作用,因为我在组元素中有图像元素。这对我有用:
from pptx import Presentation
from pptx.enum.shapes import MSO_SHAPE_TYPE
n=0
def write_image(shape):
global n
image = shape.image
# ---get image "file" contents---
image_bytes = image.blob
# ---make up a name for the file, e.g. 'image.jpg'---
image_filename = 'image{:03d}.{}'.format(n, image.ext)
n += 1
print(image_filename)
with open(image_filename, 'wb') as f:
f.write(image_bytes)
def visitor(shape):
if shape.shape_type == MSO_SHAPE_TYPE.GROUP:
for s in shape.shapes:
visitor(s)
if shape.shape_type == MSO_SHAPE_TYPE.PICTURE:
write_image(shape)
def iter_picture_shapes(prs):
for slide in prs.slides:
for shape in slide.shapes:
visitor(shape)
iter_picture_shapes(Presentation(filename))
PowerPoint 演示文稿只是一个 zip 文件。将 .pptx 重命名为 .zip,您将得到以下内容:
解压缩文件,找到媒体文件夹,并从媒体文件夹中获取图像文件,只需几行代码。完毕。 (不需要使用 python-pptx,它是创建 pptx 文件的好库)
我正在研究 python-pptx 包。对于我的代码,我需要提取演示文件中存在的所有图像。有人可以帮我解决这个问题吗?
在此先感谢您的帮助。
我的代码如下所示:
import pptx
prs = pptx.Presentation(filename)
for slide in prs.slides:
for shape in slide.shapes:
print(shape.shape_type)
在使用 shape_type 时显示 ppt 中存在的 PICTURE(13)。但我希望将图片提取到代码所在的文件夹中。
使用这个PPTExtractor repo作为参考。
ppt = PPTExtractor("some/PowerPointFile")
# found images
len(ppt)
# image list
images = ppt.namelist()
# extract image
ppt.extract(images[0])
# save image with different name
ppt.extract(images[0], "nuevo-nombre.png")
# extract all images
ppt.extractall()
将图像保存在不同的目录中:
ppt.extract("image.png", path="/another/directory")
ppt.extractall(path="/another/directory")
python-pptx
中的 Picture
(形状)对象提供对其显示的图像的访问:
from pptx import Presentation
from pptx.enum.shapes import MSO_SHAPE_TYPE
def iter_picture_shapes(prs):
for slide in prs.slides:
for shape in slide.shapes:
if shape.shape_type == MSO_SHAPE_TYPE.PICTURE:
yield shape
for picture in iter_picture_shapes(Presentation(filename)):
image = picture.image
# ---get image "file" contents---
image_bytes = image.blob
# ---make up a name for the file, e.g. 'image.jpg'---
image_filename = 'image.%s' % image.ext
with open(image_filename, 'wb') as f:
f.write(image_bytes)
生成唯一的文件名留给您作为练习。您需要的所有其他位都在这里。
有关 Image
对象的更多详细信息,请参阅此处的文档:
https://python-pptx.readthedocs.io/en/latest/api/image.html#image-objects
scanny 的解决方案对我不起作用,因为我在组元素中有图像元素。这对我有用:
from pptx import Presentation
from pptx.enum.shapes import MSO_SHAPE_TYPE
n=0
def write_image(shape):
global n
image = shape.image
# ---get image "file" contents---
image_bytes = image.blob
# ---make up a name for the file, e.g. 'image.jpg'---
image_filename = 'image{:03d}.{}'.format(n, image.ext)
n += 1
print(image_filename)
with open(image_filename, 'wb') as f:
f.write(image_bytes)
def visitor(shape):
if shape.shape_type == MSO_SHAPE_TYPE.GROUP:
for s in shape.shapes:
visitor(s)
if shape.shape_type == MSO_SHAPE_TYPE.PICTURE:
write_image(shape)
def iter_picture_shapes(prs):
for slide in prs.slides:
for shape in slide.shapes:
visitor(shape)
iter_picture_shapes(Presentation(filename))
PowerPoint 演示文稿只是一个 zip 文件。将 .pptx 重命名为 .zip,您将得到以下内容:
解压缩文件,找到媒体文件夹,并从媒体文件夹中获取图像文件,只需几行代码。完毕。 (不需要使用 python-pptx,它是创建 pptx 文件的好库)