python xml.dom - 作为另一个密钥的子项的访问密钥

python xml.dom - access keys that are child of another key

我正在尝试从如下制作的 xml 文件中访问数据

<datafile>
    <header>
        <name>catalogue</name>
        <description>the description</description>
    </header>
    <item name="jack">
        <description>the headhunter</description>
        <year>1981</year>
    </item>
    <item name="joe">
        <description>the butler</description>
        <year>1995</year>
    </item>
    <item name="david">
        <description>guest</description>
        <year>2000</year>
    </item>
</datafile>

我想解析所有 name 标签,当匹配时,我想检索描述。 到目前为止,我可以检索所有 item,并且可以打印出名称字段,但我找不到访问子标签 descriptionyear.[= 的方法18=]

from xml.dom import minidom

xmldoc = minidom.parse("myfile.xml")
# This does retrieve all the item elements 
itemlist = xmldoc.getElementsByTagName('item')
print(len(itemlist))
# This does print the name of the first element
print(itemlist[0].attributes['name'].value)
# This give me a key error, although I can see that the child element 1 of itemlist is the description
print(itemlist[1].attributes['description'].value)

我不确定如何访问子元素,因为它们是 item 元素的子元素;我是否需要从项目元素列表中创建另一个项目列表来检索描述键并访问其值?还是我完全离开了?

这是一种提取数据的方法。不确定它是否是最优雅的,但它有效:

for item in xmldoc.getElementsByTagName("item"):
    name = item.attributes.getNamedItem("name").value
    print(f"name is {name}") 
    desc = item.getElementsByTagName("description")[0].childNodes[0].data
    print(f"description is {desc}")

输出为:

name is jack
description is the headhunter
name is joe
description is the butler
name is david
description is guest

请注意,minidom 的文档有点缺乏。但是,它(大部分)实现了 DOM 标准——参见 documentation here

一行 - 使用 ElementTree

import xml.etree.ElementTree as ET

xml = '''
<datafile>
    <header>
        <name>catalogue</name>
        <description>the description</description>
    </header>
    <item name="jack">
        <description>the headhunter</description>
        <year>1981</year>
    </item>
    <item name="joe">
        <description>the butler</description>
        <year>1995</year>
    </item>
    <item name="david">
        <description>guest</description>
        <year>2000</year>
    </item>
</datafile>'''

root = ET.fromstring(xml)
data = [(i.attrib['name'],i.find('./description').text) for i in root.findall('.//item')]
print(data)

输出

[('jack', 'the headhunter'), ('joe', 'the butler'), ('david', 'guest')]