如何使用 python 在循环中输出正确的语句

Question

我目前正在使用 python-docx 库（0.8 版）在 PyCharm（社区版 2021.3.3）上编写一些 python（版本 3.10.4）代码。 1.1)，它允许确定格式为 'Normal' 的文本（在 Word 文档中）是否包含特定字体（Times New Roman 或 Times New Roman 和 Cambria Math）。当我执行代码时，打印的语句不是我想要的。

我这里的意思是，如果所有文本（'Normal' 样式）都是 Times New Roman，它应该打印“正文是 Times New Roman”，而如果文本同时包含 Times New Roman 和 Cambria Math 它应该打印，“正文是 Times New Roman 和 Cambria Math”，如果文本既不是 Times New Roman 也不是 Times New Roman 和 Cambria Math 的组合，它应该打印，“无法识别的正文字体”。

当我执行代码（如下所示）时，它会打印出 'Body text is in Times New Roman' 和 'Unrecognised body text font' 的组合（两者都打印出此类事件在文档中出现的次数）。 Word 文档包含以下字体：Times New Roman、Cambria Math 和 Arial（仅用于测试目的）。所以它应该打印“无法识别的正文字体”（因为所有文本都不是 Times New Roman 也不是 Times New Roman 和 Cambria Math 的组合）。

import docx  # import the python-docx library
WordFile = docx.Document("my file directory")  # Word document file directory for python-docx to access

for paragraph in WordFile.paragraphs:
    name = []
    if 'Normal' == paragraph.style.name:
        for run in paragraph.runs:
            name.append(run.font.name)
            for i in name:
                if i == 'Times New Roman':   # checks if the elements in name = [] are 'Times New   Roman'
                    print("Body text is in Times New Roman")  # print this statement if all  elements in name = [] are 'Times New Roman'
                elif i == 'Times New Roman' and i == 'Cambria Math':  # checks if the elements in name = [] are 'Times New Roman' and 'Cambria Math'
                    print("Body text is Times New Roman and Cambria Math")  # print this if fonts in name = [] are both 'Times New Roman' and 'Cambria Math'
                else:
                    print("Unrecognised body text font")  # print this if fonts in name = [] are neither all 'Times New Roman' nor a combo of 'Times New
                    # Roman' and 'Cambria Math'

我认为问题出在循环中，它检查空列表 name = [] 中的所有元素是否都是特定字体。只有当列表中的所有元素都满足给定条件时，打印语句才应该执行，并且只有一个语句应该被打印，而不是当前生成的组合。但我似乎无法解决这个问题。任何形式的帮助将不胜感激。附上当前产出的图片。

Answer 1

如果我正确理解了这个问题，我认为编辑当前实现的最简单方法是跟踪您识别的字体。您可以将识别的字体保存在 set 中并检查循环后找到的内容。所以像这样：

import docx  # import the python-docx library
WordFile = docx.Document("my file directory")  # Word document file directory for python-docx to access

font_names = set()
for paragraph in WordFile.paragraphs:
    if 'Normal' == paragraph.style.name:
        for run in paragraph.runs:
            font_name = run.font.name
            if font_name not in ('Times New Roman', 'Cambria Math'):
                font_names.add(font_name)
            else:
                font_names.add("Unrecognized")
                break
        else:
            continue
        break

if "Unrecognized" in font_names:
    print("Unrecognised body text font")
elif 'Times New Roman' in font_names:
    if len(font_names) == 2 and 'Cambria Math' in font_names:
        print("Body text is Times New Roman and Cambria Math")
    elif len(font_names) == 1:
        print("Body text is in Times New Roman")

我假设您不想要每个段落的印刷品，而是想要整个文本的印刷品。如有任何关于更改的问题，请随时提出。

Answer 2

我认为您正在寻找这种功能：

# return True if both Times New Roman and Cambria Math appear in your final list
all(i in name for i in ['Times New Roman', 'Cambria Math'])

或者也许：

# return True if *only* Times New Roman and Cambria Math appear in the final list
all(i in ['Times New Roman', 'Cambria Math'] for i in name)

在不理解其余逻辑的情况下，代码中似乎还有其他问题：

此字体检查可能应该缩进，因此它运行只有在收集了每个段落（或文档？）的信息后才会
我们似乎不必要地将字体名称附加到列表中，导致重复并减慢您的最终列表迭代。最终是否需要对此列表进行分析？如果不是，请考虑检查或其他数据类型，如 set 以避免重复。

现在，只需简单修改您的逻辑即可实现所需的行为。我实现了列表检查的一种变体，但您可以选择适合您的用例的变体。

import docx  # import the python-docx library

# Word document file directory for python-docx to access
WordFile = docx.Document("test1.docx")

font_names = set()
for paragraph in WordFile.paragraphs:
    if "Normal" == paragraph.style.name:
        for run in paragraph.runs:
            if run.font.name is not None and run.font.name not in font_names:
                font_names.add(run.font.name)

if {"Times New Roman", "Cambria Math"} == font_names:
    # print this if fonts in name = [] are both 'Times New Roman' and 'Cambria Math'
    print("Body text is Times New Roman and Cambria Math")
elif {"Times New Roman"} == font_names:
    # checks if the elements in name = [] are 'Times New Roman'
    # print this statement if all  elements in name = [] are 'Times New Roman'
    print("Body text has Times New Roman")
else:
    print("Unrecognised body text font")

如何使用 python 在循环中输出正确的语句

How to output the correct statement in a loop using python

python

python-docx