使用多个默认名称空间时，如何访问 XML 中的元素？

Question

我希望此代码生成一个非空列表：

import xml.etree.ElementTree as et

xml = '''<?xml version="1.0" encoding="UTF-8"?>
<A
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns="a:namespace">
    <B xmlns="b:namespace">
        <C>"Stuff"</C>
    </B>
</A>
'''
namespaces = {'a' : 'a:namespace', 'b' : 'b:namespace'}
xroot = et.fromstring(xml)

res = xroot.findall('b:C', namespaces)

相反，res 是一个空数组。为什么？

当我检查 xroot 的内容时，我可以看到 C 项在 b:namespace 内，正如预期的那样：

for x in xroot.iter():
    print(x)

# result:
<Element '{a:namespace}A' at 0x7f56e13b95e8>
<Element '{b:namespace}B' at 0x7f56e188d2c8>
<Element '{b:namespace}C' at 0x7f56e188def8>

为了检查我的命名空间是否有问题，我也试过了； xroot.findall('{b:namespace}C') 但结果也是一个空数组。

Answer 1

您的 findall xpath 'b:C' 仅搜索根元素中的标签；您需要将其设置为 './/b:C' 以便在树中的任何位置找到标签并且它有效，例如：

import xml.etree.ElementTree as et

xml = '''<?xml version="1.0" encoding="UTF-8"?>
<A
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns="a:namespace">
    <B xmlns="b:namespace">
        <C>"Stuff"</C>
    </B>
</A>
'''
namespaces = {'a' : 'a:namespace', 'b' : 'b:namespace'}
xroot = et.fromstring(xml)

######## changed the xpath to start with .//
res = xroot.findall('.//b:C', namespaces)

print( f"{res=}" )

for x in xroot.iter():
    print(x)

输出：

res=[<Element '{b:namespace}C' at 0x00000222DFCAAA40>]
<Element '{a:namespace}A' at 0x00000222DFCAA9A0>
<Element '{b:namespace}B' at 0x00000222DFCAA9F0>
<Element '{b:namespace}C' at 0x00000222DFCAAA40>

有关 ElementTree xpath 支持的一些有用示例，请参阅此处 https://docs.python.org/3/library/xml.etree.elementtree.html?highlight=xpath#xpath-support

使用多个默认名称空间时，如何访问 XML 中的元素？

How do I access elements in an XML when multiple default namespaces are used?

python

xml

elementtree