使用 python-pptx 从 powerpoint 演示文稿中删除所有元数据
Remove all metadata from powerpoint presentation using python-pptx
我可以使用以下代码 remove/overwrite 一些元数据(存储在 core.xml 中的元数据):
def remove_metadata(prs):
"""Overwrites the metadata in core.xml however does not overwrite metadata which is stored in app.xml"""
prs.core_properties.title = 'PowerPoint Presentation'
prs.core_properties.last_modified_by = 'python-pptx'
prs.core_properties.revision = 1
prs.core_properties.modified = datetime.utcnow()
prs.core_properties.subject = ''
prs.core_properties.author = 'python-pptx'
prs.core_properties.keywords = ''
prs.core_properties.comments = ''
prs.core_properties.created = datetime.utcnow()
prs.core_properties.category = ''
prs = pptx.Presentation('my_pres.xml')
remove_metadata(prs)
这很有用 - 但 app.xml 中还存储了其他元数据,例如公司和经理。我还需要清除这些属性。使用 python-pptx 如何编辑 app.xml 文件?
我找到了解决办法。这不一定是处理此问题的理想方法,但似乎有效:
def remove_metadata_from_app_xml(prs):
"""There is currently no functionality for handling app.xml so
have to find the part and then alter its blob manually
"""
package_parts = prs.part.package.parts
for part in package_parts:
if part.partname.endswith('app.xml'):
app_xml_part = part
app_xml = app_xml_part.blob.decode('utf-8')
tags_to_remove = ('Company', 'Manager', 'HyperlinkBase')
for tag in tags_to_remove:
pattern = f'<{tag}>.*<\/{tag}>'
app_xml = re.sub(pattern, '', app_xml)
app_xml_part.blob = bytearray(app_xml, 'utf-8')
我可以使用以下代码 remove/overwrite 一些元数据(存储在 core.xml 中的元数据):
def remove_metadata(prs):
"""Overwrites the metadata in core.xml however does not overwrite metadata which is stored in app.xml"""
prs.core_properties.title = 'PowerPoint Presentation'
prs.core_properties.last_modified_by = 'python-pptx'
prs.core_properties.revision = 1
prs.core_properties.modified = datetime.utcnow()
prs.core_properties.subject = ''
prs.core_properties.author = 'python-pptx'
prs.core_properties.keywords = ''
prs.core_properties.comments = ''
prs.core_properties.created = datetime.utcnow()
prs.core_properties.category = ''
prs = pptx.Presentation('my_pres.xml')
remove_metadata(prs)
这很有用 - 但 app.xml 中还存储了其他元数据,例如公司和经理。我还需要清除这些属性。使用 python-pptx 如何编辑 app.xml 文件?
我找到了解决办法。这不一定是处理此问题的理想方法,但似乎有效:
def remove_metadata_from_app_xml(prs):
"""There is currently no functionality for handling app.xml so
have to find the part and then alter its blob manually
"""
package_parts = prs.part.package.parts
for part in package_parts:
if part.partname.endswith('app.xml'):
app_xml_part = part
app_xml = app_xml_part.blob.decode('utf-8')
tags_to_remove = ('Company', 'Manager', 'HyperlinkBase')
for tag in tags_to_remove:
pattern = f'<{tag}>.*<\/{tag}>'
app_xml = re.sub(pattern, '', app_xml)
app_xml_part.blob = bytearray(app_xml, 'utf-8')