Python 解析来自 XML 的流程指令

Question

我正在使用 Python 查看内部 these files：每个 zip 包含一个 xml 文件，其基本名称与 zip 文件相同。每个 xml 文件都是数千个单独的 xml 文件的串联，我将这些文件分成了单独的文件。其中一些 XML 文件有一个看起来像这样的标签，我在解析树中找不到这些文件时遇到了麻烦。到目前为止，我使用以下代码：

import os
import xml.etree.cElementTree as cET


fname = 'extracted_xmls/ipg140107/1163_G_08622343.xml'

parsed = cET.parse(fname)
root = parsed.getroot()
if root.tag == "us-patent-grant":
    bibref = root.find('us-bibliographic-data-grant')
    pubref = bibref.find('publication-reference')
    prefix = "G"
elif root.tag == "sequence-cwu":
    pubref = root.find('publication-reference')
    prefix = "S"
else:
    print fname, "...uncoded tag"

for g in root.iter():
    if g.tag == 'description':
        print g.tag
        for ga in g.iter():
            print ga.tag

            for g in root.findall('?GOVINT'):
                print g

但是没有出现。我认为这些前面带有问号的特殊标签称为 "processing instructions," 但我不知道如何提取它们。任何评论、指示，尤其是用于遍历这些内容的代码片段，我们将不胜感激。

elementTree 的文档说 parse 命令会忽略任何注释或处理指令。所以现在的问题是 - 有没有不这样做的解析器？

Answer 1

答案是这样的：前面有问号的标签不是真正的标签。它们是 "processing instructions." 根据 ElementTree 的文档，处理指令在解析过程中被忽略。

Python 解析来自 XML 的流程指令

Python to parse process instructions from XML

python

xml-parsing

processing-instruction