使用 Element Tree findall 解析 XML 命名空间

Question

如何使用查询元素树 findall('Email') 给定以下 xml？

<DocuSignEnvelopeInformation xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.docusign.net/API/3.0">
    <EnvelopeStatus>
        <RecipientStatus>
                <Type>Signer</Type>
                <Email>joe@gmail.com</Email>
                <UserName>Joe Shmoe</UserName>
                <RoutingOrder>1</RoutingOrder>
                <Sent>2015-05-04T09:58:01.947</Sent>
                <Delivered>2015-05-04T09:58:14.403</Delivered>
                <Signed>2015-05-04T09:58:29.473</Signed>
        </RecipientStatus>
    </EnvelopeStatus>
</DocuSignEnvelopeInformation>

我感觉它与命名空间有关，但我不确定。我看着 docs 并没有运气。

tree = <xml.etree.ElementTree.ElementTree object at 0x7f27a47c4fd0>
root = tree.getroot()
root
<Element '{http://www.docusign.net/API/3.0}DocuSignEnvelopeInformation' at 0x7f27a47b8a48>

root.findall('Email')
[]

Answer 1

您应该更仔细地阅读文档，尤其是关于 Parsing XML with Namespaces 的部分，其中包含的示例几乎正是您想要的。

但即使没有文档，答案实际上也包含在您的示例输出中。当您打印文档的根元素时...

>>> tree = etree.parse(open('data.xml'))
>>> root = tree.getroot()
>>> root
<Element {http://www.docusign.net/API/3.0}DocuSignEnvelopeInformation at 0x7f972cd079e0>

...您可以看到它打印了带有命名空间前缀 ({http://www.docusign.net/API/3.0}) 的根元素名称 (DocuSignEnvelopeInformation)。您可以使用相同的前缀作为 findall:

参数的一部分

>>> root.findall('{http://www.docusign.net/API/3.0}Email')

但这本身是行不通的，因为这只会找到 Email 根元素的直接子元素。您需要提供一个 ElementPath 表达式来使 findall 执行对整个文档的搜索。这有效：

>>> root.findall('.//{http://www.docusign.net/API/3.0}Email')
[<Element {http://www.docusign.net/API/3.0}Email at 0x7f972949a6c8>]

您还可以使用 XPath 和命名空间前缀执行类似的搜索，如下所示：

>>> root.xpath('//docusign:Email',
... namespaces={'docusign': 'http://www.docusign.net/API/3.0'})
[<Element {http://www.docusign.net/API/3.0}Email at 0x7f972949a6c8>]

这让您可以使用 XML-like namespace: 前缀而不是 LXML 命名空间语法。

使用 Element Tree findall 解析 XML 命名空间

Parse XML namespace with Element Tree findall

python

xml

docusignapi