如何使用 Python 的 LXML 模块根据 child 标记中的属性将元素树保存到列表中？

Question

我有一份 xml 文档需要解析。我正在使用 python 3.8 和 lxml 模块。

XML 包含具有其他 child 元素标签的标题，例如下面的 xml。我只需要找到“更改”事件并将“标题”保留在列表中。我想保存那个标题的所有标签，这样我就可以提取我需要的数据。

这是我的 XML 示例：

'''
<root>
    <Title ref="111111">
        <Events>
            <Event type="change"/>
        </Events>
        <tag1>John</tag1>
        <tag2>A.</tag2>
        <tag3>Smith</tag3>
    </Title>
        <Title ref="222222">
        <Events>
            <Event type="cancel"/>
        </Events>
        <tag1>Bob</tag1>
        <tag2>B.</tag2>
        <tag3>Hope</tag3>
    </Title>
        <Title ref="333333">
        <Events>
            <Event type="change"/>
        </Events>
        <tag1>Julie</tag1>
        <tag2>A.</tag2>
        <tag3>Moore</tag3>
    </Title>
        <Title ref="444444">
        <Events>
            <Event type="cancel"/>
        </Events>
        <tag1>First</tag1>
        <tag2>M</tag2>
        <tag3>Last</tag3>
    </Title>
</root>
'''

我试过使用 findall() 函数，但它似乎只保留“事件”标签而不是“标题”标签及其所有 children。我也使用 xpath 得到相同的结果。

Answer 1

如果 txt 是问题中的 XML 片段，那么您可以执行此操作以提取包含 <Event type="change">:

的 <Title> 标签

from lxml import etree, html

root = etree.fromstring(txt)

for title in root.xpath('.//Title[.//Event[@type="change"]]'):
    print(html.tostring(title).decode('utf-8'))
    print('-' * 80)

打印：

<Title ref="111111">
        <Events>
            <Event type="change"></Event>
        </Events>
        <tag1>John</tag1>
        <tag2>A.</tag2>
        <tag3>Smith</tag3>
    </Title>
        
--------------------------------------------------------------------------------
<Title ref="333333">
        <Events>
            <Event type="change"></Event>
        </Events>
        <tag1>Julie</tag1>
        <tag2>A.</tag2>
        <tag3>Moore</tag3>
    </Title>
        
--------------------------------------------------------------------------------

如何使用 Python 的 LXML 模块根据 child 标记中的属性将元素树保存到列表中？

How do I save an Element Tree to a list based on an attribute in a child tag using Python's LXML module?

python

xml

xpath

parsing

lxml