以编程方式将两个 docx 文件与 Python 合并，将字符样式保留在合并的段落中

Question

我正在尝试以编程方式合并两个 Microsoft Word 文件：

和：

我用 python-docx 写了一个程序：

from docx import Document
t1 = Document("test1.docx")
t2 = Document("test2.docx")
for p in t2.paragraphs:
    t1.add_paragraph(p.text,p.style)
t1.save("test1-new.docx")

我得到了这个结果：

如您所见，我得到了文本和基本段落样式，但丢失了逐字样式。

有什么方法可以保留吗？

Answer 1

我运行做了一个小测试，我做了一个这样的文档：

你好

from docx import Document
t1 = Document("test.docx")

for p in t1.paragraphs:
    for run in p.runs:
        #print([method for method in dir(run.style)])
        print(run.font.bold, run.font.italic)

Returns:

None None
True None
True True

因此，如果您付出更多努力，您可以从段落内的运行中提取粗体和斜体。

Answer 2

这是工作代码：

#!/usr/bin/env python3.6

import os
import os.path
from docx import Document

def append_to_doc(doc,fname):
    t = Document(fname)
    for p in t.paragraphs:
        doc.add_paragraph("",p.style)       # add an empty paragraph in the matching style
        for r in p.runs:
            nr = doc.paragraphs[-1].add_run(r.text)
            nr.bold = r.bold
            nr.italic = r.italic
            nr.underline = r.underline


if __name__=="__main__":
    import argparse
    parser = argparse.ArgumentParser()
    parser.add_argument("--output",help="Output file")
    parser.add_argument("--template",help="Base file")
    parser.add_argument("files",nargs="+",help="Files to add")
    args = parser.parse_args()

    if not args.output:
        raise RuntimeError("--output required")
    if os.path.exists(args.output):
        raise RuntimeError(f"{args.output} exists")
    if not args.template:
        raise RuntimeError("--template required")


    doc = Document(args.template)
    for fname in args.files:
        append_to_doc(doc,fname)
    doc.save(args.output)

以编程方式将两个 docx 文件与 Python 合并，将字符样式保留在合并的段落中

Programmatically merge two docx file with Python, keeping the character styles within the merged paragraph

python

docx