保留 read/write 上的文本格式以塑造文本 python pptx

Question

我希望在形状的文本中执行文本替换。我使用的代码类似于以下代码段：

# define key/value
SRKeys, SRVals = ['x','y','z'], [1,2,3]

# define text
text = shape.text

# iterate through values and perform subs
for i in range(len(SRKeys)):
    # replace text
    text = text.replace(SRKeys[i], str(SRVals[i]))

# write text subs to comment box
shape.text = text

但是，如果初始 shape.text 具有格式化字符（例如粗体），则在读取时删除格式。有解决办法吗？

我唯一能想到的就是遍历字符并检查格式，然后在写入 shape.text 之前添加这些格式。

Answer 1

@usr2564301 走在正确的轨道上。字符格式（又名 "font"）在运行级别指定。这就是运行的含义； "run"（序列）字符都共享相同的字符格式。

当您分配给 shape.text 时，您会用一个具有默认格式的新运行替换所有曾经存在的运行。如果你想保留格式，你需要保留任何运行s 不直接参与文本替换。

这不是一个小问题，因为不能保证运行s 会打破单词边界。尝试打印出几个段落的运行s，我想你会明白我的意思。

在粗略的伪代码中，我认为这是您需要采用的方法：

搜索段落中的目标文本以确定其第一个字符的偏移量。
遍历段落中的所有运行s，保留每个运行之前的字符总数运行ning，可能类似于 (run_idx, prefix_len, 长度): (0, 0, 8), (1, 8, 4), (2, 12, 9), 等等
确定哪个运行是涉及您的搜索字符串的开始、结束和中间运行。
拆分搜索词开头的第一个运行，拆分搜索词结尾的最后一个运行，并删除 "middle" 中除第一个以外的所有内容运行s.
更改中间的文本运行替换文本并从之前的（原始开始）克隆格式运行。也许这最后一点是你在分段开始时做的。

这将保留任何不涉及搜索字符串的运行，并保留 "replaced" 字中 "matched" 字的格式。

这需要一些当前 API 不直接支持的操作。对于那些你需要使用较低级别的 lxml 调用来直接操作 XML，尽管你可以从 python-pptx 对象中获取你需要的所有现有元素，而无需自己解析 XML。

Answer 2

这是我正在使用的代码的改编版本（灵感来自@scanny 的回答）。它替换幻灯片上所有形状（带文本框）的文本。

from pptx import Presentation

prs = Presentation('../../test.pptx')
slide = prs.slides[1]

# iterate through all shapes on slide
for shape in slide.shapes:
    if not shape.has_text_frame:
        continue
        
    # iterate through paragarphs in shape
    for p in shape.text_frame.paragraphs:
        # store formats and their runs by index (not dict because of duplicate runs)
        formats, newRuns = [], []

        # iterate through runs
        for r in p.runs:
            # get text
            text = r.text

            # replace text
            text = text.replace('s','xyz')

            # store run
            newRuns.append(text)

            # store format
            formats.append({'size':r.font.size,
                            'bold':r.font.bold,
                            'underline':r.font.underline,
                            'italic':r.font.italic})

        # clear paragraph
        p.clear()

        # iterate through new runs and formats and write to paragraph
        for i in range(len(newRuns)):
            # add run with text
            run = p.add_run()
            run.text = newRuns[i]

            # format run
            run.font.bold = formats[i]['bold']
            run.font.italic = formats[i]['italic']
            run.font.size = formats[i]['size']
            run.font.underline = formats[i]['underline']

prs.save('../../test.pptx')

保留 read/write 上的文本格式以塑造文本 python pptx

Preserve text format on read/write to shape text python pptx

python

text

python-pptx