获取具有属性的 etree 元素，或包含具有属性的子元素

Question

我有一个 XML 文件要解析，我需要通过 id 查找元素。

在示例代码中，我需要找到 driver 的名称，但我不知道我的 id 是 vehicle、engine 还是 block。我想要一个可以在 vehicle 内使用任意 xml 的解决方案（但保证 driver 的存在）。

<road>
    <vehicle id="16">
        <driver>Bob Johnson</driver>
        <engine id="532">
            <type>V8</type>
            <block id="113">
                <material>Aluminium</material>
            </block>
        </engine>
    </vehicle>
    <vehicle id="452">
        <driver>Dave Edwards</driver>
        <engine id="212">
            <type>Inline 6</type>
            <block id="381">
                <material>Cast Iron</material>
            </block>
        </engine>
    </vehicle>
</road>

我尝试了什么

我试图通过它们的 id 获取元素，然后，如果它们不是 vehicle 标签，则向上导航树以找到它，但似乎 python 的 elem.find() returns None 如果结果在 elem.

之外

查看the docs，他们有这个例子：

# Nodes with name='Singapore' that have a 'year' child
root.findall(".//year/..[@name='Singapore']")

但我看不出如何让任何后代都能做到这一点，而不是特定级别的后代。

Answer 1

注意：下面的所有片段都使用了lxml库。要安装，运行：pip install lxml。

您应该使用 root.xpath(..) 而不是 root.findall(..)。

>>> root.xpath("//vehicle/driver/text()")
['Bob Johnson', 'Dave Edwards']

如果你想从给定的 ID 中提取 driver 的名字，你会这样做：

>>> vehicle_id = "16"
>>> xpath("//vehicle[@id='16' or .//*[@id='16']]/driver/text()")
['Bob Johnson']

更新：要获得嵌套在任何更深层次的给定 id 的 driver 名称，您需要：

>>> i = '16'
>>> a.xpath("//vehicle[@id='%s' or .//*[@id='%s']]/driver/text()"%(i,i))
['Bob Johnson']
>>> i = '532'
>>> a.xpath("//vehicle[@id='%s' or .//*[@id='%s']]/driver/text()"%(i,i))
['Bob Johnson']
>>> i = '113'
>>> a.xpath("//vehicle[@id='%s' or .//*[@id='%s']]/driver/text()"%(i,i))
['Bob Johnson']

Answer 2

如果您知道 id，但不知道这个 id 是来自车辆、引擎还是块，您可以使用 XPath 表达式来处理它，但您必须使用lxml.etree instead of xml.etree.ElementTree (it has very limited XPath support). Use the ancestor-or-self轴：

input_id = "your ID"
print(root.xpath(".//*[@id='%s']/ancestor-or-self::vehicle/driver" % input_id)[0].text)

这将打印：

Bob Johnson 如果 input_id 将是 16 或 532 或 113
Dave Edwards 如果 input_id 将是 452 或 212 或 381

完整的工作示例：

import lxml.etree as ET

data = """
<road>
    <vehicle id="16">
        <driver>Bob Johnson</driver>
        <engine id="532">
            <type>V8</type>
            <block id="113">
                <material>Aluminium</material>
            </block>
        </engine>
    </vehicle>
    <vehicle id="452">
        <driver>Dave Edwards</driver>
        <engine id="212">
            <type>Inline 6</type>
            <block id="381">
                <material>Cast Iron</material>
            </block>
        </engine>
    </vehicle>
</road>
"""

root = ET.fromstring(data)
for input_id in [16, 532, 113, 452, 212, 381]:
    print(root.xpath(".//*[@id='%s']/ancestor-or-self::vehicle/driver" % input_id)[0].text)

打印：

Bob Johnson
Bob Johnson
Bob Johnson
Dave Edwards
Dave Edwards
Dave Edwards

获取具有属性的 etree 元素，或包含具有属性的子元素

Get etree Element with attribute, or containing subelement with attribute

python

xml

elementtree