Python pptx (Power Point) 查找和替换文本 (ctrl + H)

Question

问题简述：如何使用查找和替换选项 (Ctrl+H) 使用 Python-pptx 模块？

示例代码：

from pptx import Presentation

nameOfFile = "NewPowerPoint.pptx" #Replace this with: path name on your computer + name of the new file.

def open_PowerPoint_Presentation(oldFileName, newFileName):
    prs = Presentation(oldFileName)

    prs.save(newFileName)
open_PowerPoint_Presentation('Template.pptx', nameOfFile)

我有一个名为 "Template.pptx" 的 Power Point 文档。在我的 Python 程序中，我添加了一些幻灯片并在其中放了一些图片。将所有图片放入文档后，它会将其另存为另一个 power point 演示文稿。

问题是这个"Template.pptx"里面有所有旧周数，比如"Week 20"。我想让 Python 找到所有这些单词组合并将其替换为 "Week 25"（例如）。

Answer 1

您必须访问每个形状上的每张幻灯片，并使用可用的文本功能查找匹配项。它可能并不漂亮，因为 PowerPoint 习惯于将运行分成看起来很奇怪的块。它这样做是为了支持拼写检查等功能，但它的行为是不可预测的。

因此，找到 Shape.text 之类的事件可能是比较容易的部分。在不丢失任何字体格式的情况下替换它们可能会更加困难，具体取决于您的具体情况。

Answer 2

我知道这个问题很老了，但我刚刚完成了一个使用 python 每天更新 powerpoint 的项目。基本上每天早上 python 脚本都是运行它从数据库中提取当天的数据，将数据放入 powerpoint，然后执行 powerpoint viewer 来播放 powerpoint。

要回答您的问题，您必须遍历页面上的所有形状并检查您要搜索的字符串是否在 shape.text 中。您可以通过检查 shape.has_text_frame 是否为真来检查形状是否有文本。这样可以避免错误。

这就是事情变得棘手的地方。如果您只是将 shape.text 中的字符串替换为您要插入的文本，您可能会丢失格式。 shape.text 实际上是形状中所有文本的串联。该文本可能会被分成许多 'runs'，并且所有这些运行可能具有不同的格式，如果您覆盖 shape.text 或替换部分字符串，这些格式将会丢失。

幻灯片上有形状，形状可以有 text_frame，text_frame 有段落（至少一个。总是。即使它是空白的），段落可以有运行秒。任何级别都可以有格式，并且您无法确定您的字符串拆分了多少运行。

在我的例子中，我确保任何要被替换的字符串都是它自己的形状。您仍然必须一直向下钻取到运行并在那里设置文本，以便保留所有格式。此外，您在 shape.text 中匹配的字符串实际上可能分布在多个运行中，因此在第一个运行中设置文本时，我还设置了所有其他 [=36] 中的文本=]s 在该段中留空。

随机代码片段：

from pptx import Presentation

testString = '{{thingToReplace}}'
replaceString = 'this will be inserted'
ppt = Presentation('somepptxfile.pptx')

def replaceText(shape, string,replaceString):
    #this is the hard part
    #you know the string is in there, but it may be across many runs


for slide in ppt.slides:
    for shape in slide.shapes:
        if shape.has_text_frame:
            if(shape.text.find(testString)!=-1:
                replaceText(shape,testString,replaceString)

抱歉，如果有任何拼写错误。我在工作.....

Answer 3

下面是一些可以提供帮助的代码。 I found it here:

search_str = '{{{old text}}}'
repl_str = 'changed Text'
ppt = Presentation('Presentation1.pptx')
for slide in ppt.slides:
    for shape in slide.shapes:
        if shape.has_text_frame:
            shape.text = shape.text.replace(search_str, repl_str)
ppt.save('Presentation1.pptx')

Answer 4

对于那些只想将一些代码复制并粘贴到您的程序中以查找和替换 PowerPoint 中的文本的人同时保持格式（就像我一样，）在这里你去：

def search_and_replace(search_str, repl_str, input, output):
    """"search and replace text in PowerPoint while preserving formatting"""
    #Useful Links ;)
    #
    #
    from pptx import Presentation
    prs = Presentation(input)
    for slide in prs.slides:
        for shape in slide.shapes:
            if shape.has_text_frame:
                if(shape.text.find(search_str))!=-1:
                    text_frame = shape.text_frame
                    cur_text = text_frame.paragraphs[0].runs[0].text
                    new_text = cur_text.replace(str(search_str), str(repl_str))
                    text_frame.paragraphs[0].runs[0].text = new_text
    prs.save(output)

先验是许多答案的组合，但它完成了工作。它只是在每次出现 search_str.

时将 search_str 替换为 repl_str

在此答案的范围内，您将使用： search_and_replace('Week 20', 'Week 25', "Template.pptx", "NewPowerPoint.pptx")

Answer 5

从我自己的项目中发布代码，因为 none 的其他答案非常成功地使用具有多个段落的复杂文本的字符串而不会丢失格式：

prs = Presentation('blah.pptx')

# To get shapes in your slides
slides = [slide for slide in prs.slides]
shapes = []
for slide in slides:
    for shape in slide.shapes:
        shapes.append(shape)

def replace_text(self, replacements: dict, shapes: List):
    """Takes dict of {match: replacement, ... } and replaces all matches.
    Currently not implemented for charts or graphics.
    """
    for shape in shapes:
        for match, replacement in replacements.items():
            if shape.has_text_frame:
                if (shape.text.find(match)) != -1:
                    text_frame = shape.text_frame
                    for paragraph in text_frame.paragraphs:
                        for run in paragraph.runs:
                            cur_text = run.text
                            new_text = cur_text.replace(str(match), str(replacement))
                            run.text = new_text
            if shape.has_table:
                for row in shape.table.rows:
                    for cell in row.cells:
                        if match in cell.text:
                            new_text = cell.text.replace(match, replacement)
                            cell.text = new_text

replace_text({'string to replace': 'replacement text'}, shapes)

Answer 6

以适合我的方式合并以上和其他回复 (PYTHON 3)。保留所有原始格式：

from pptx import Presentation

def replace_text(replacements, shapes):
    """Takes dict of {match: replacement, ... } and replaces all matches.
    Currently not implemented for charts or graphics.
    """
    for shape in shapes:
        for match, replacement in replacements.items():
            if shape.has_text_frame:
                if (shape.text.find(match)) != -1:
                    text_frame = shape.text_frame
                    for paragraph in text_frame.paragraphs:
                        whole_text = "".join(run.text for run in paragraph.runs)
                        whole_text = whole_text.replace(str(match), str(replacement))
                        for idx, run in enumerate(paragraph.runs):
                            if idx != 0:
                                p = paragraph._p
                                p.remove(run._r)
                        if bool(paragraph.runs):
                            paragraph.runs[0].text = whole_text

if __name__ == '__main__':

    prs = Presentation('input.pptx')
    # To get shapes in your slides
    slides = [slide for slide in prs.slides]
    shapes = []
    for slide in slides:
        for shape in slide.shapes:
            shapes.append(shape)

    replaces = {
                        '{{var1}}': 'text 1',
                        '{{var2}}': 'text 2',
                        '{{var3}}': 'text 3'
                }
    replace_text(replaces, shapes)
    prs.save('output.pptx')

Answer 7

我遇到了类似的问题，即格式化占位符分布在多个运行对象上。我想保留格式，所以我无法在段落级别进行替换。最后，我想出了一个替换占位符的方法。

variable_pattern = re.compile("{{(\w+)}}")
def process_shape_with_text(shape, variable_pattern):
if not shape.has_text_frame:
    return

whole_paragraph = shape.text
matches = variable_pattern.findall(whole_paragraph)
if len(matches) == 0:
    return

is_found = False
for paragraph in shape.text_frame.paragraphs:
    for run in paragraph.runs:
        matches = variable_pattern.findall(run.text)
        if len(matches) == 0:
            continue
        replace_variable_with(run, data, matches)
        is_found = True

if not is_found:
    print("Not found the matched variables in the run segment but in the paragraph, target -> %s" % whole_paragraph)

    matches = variable_pattern.finditer(whole_paragraph)
    space_prefix = re.match("^\s+", whole_paragraph)

    match_container = [x for x in matches];
    need_modification = {}
    for i in range(len(match_container)):
        m = match_container[i]
        path_recorder = space_prefix.group(0)

        (start_0, end_0) = m.span(0)
        (start_1, end_1) = m.span(1)

        if (i + 1) > len(match_container) - 1 :
            right = end_0 + 1
        else:
            right = match_container[i + 1].start(0)

        for paragraph in shape.text_frame.paragraphs:
            for run in paragraph.runs:
                segment = run.text
                path_recorder += segment

                if len(path_recorder) >= start_0 + 1 and len(path_recorder) <= right:
                    print("find it")

                    if len(path_recorder) <= start_1:
                        need_modification[run] = run.text.replace('{', '')

                    elif len(path_recorder) <= end_1:
                        need_modification[run] = data[m.group(1)]

                    elif len(path_recorder) <= right:
                        need_modification[run] = run.text.replace('}', '')

                    else:
                        None


    if len(need_modification) > 0:
        for key, value in need_modification.items():
            key.text = value

Python pptx (Power Point) 查找和替换文本 (ctrl + H)

Python pptx (Power Point) Find and replace text (ctrl + H)

python

powerpoint

python-2.7

python-pptx