在 Python 中使用 ElementTree 获取 XML 值

Question

我有这个 XML 文件：

<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<feed xml:base="https://receasy1p1942606901trial.hanatrial.ondemand.com:443/rec/Accrual_PO.xsodata/"
    xmlns:d="http://schemas.microsoft.com/ado/2007/08/dataservices"
    xmlns:m="http://schemas.microsoft.com/ado/2007/08/dataservices/metadata"
    xmlns="http://www.w3.org/2005/Atom">
    <title type="text">accruals_po</title>
    <id>https://receasy1p1942606901trial.hanatrial.ondemand.com:443/rec/Accrual_PO.xsodata/accruals_po</id>
    <author>
        <name />
    </author>
    <link rel="self" title="accruals_po" href="accruals_po" />
    <entry>
        <id>https://receasy1p1942606901trial.hanatrial.ondemand.com:443/rec/Accrual_PO.xsodata/accruals_po('96372537-120')</id>
        <title type="text"></title>
        <author>
            <name />
        </author>
        <link rel="edit" title="accruals_po" href="accruals_po('96372537-120')"/>
        <category term="receasy.accruals_poType" scheme="http://schemas.microsoft.com/ado/2007/08/dataservices/scheme" />
        <content type="application/xml">
            <m:properties>
                <d:PO_NUMBER m:type="Edm.String">96372537-120</d:PO_NUMBER>
                <d:SAP_AMT m:type="Edm.Single">109</d:SAP_AMT>
                <d:GL_ACCOUNT m:type="Edm.Int64">65009000</d:GL_ACCOUNT>
                <d:COMPANY_CODE m:type="Edm.String">US10_OH</d:COMPANY_CODE>
                <d:CONFIRMED_ACCRUAL_AMT m:type="Edm.Single">109</d:CONFIRMED_ACCRUAL_AMT>
                <d:FINAL_APPROVER m:type="Edm.String">europe\bamcguir</d:FINAL_APPROVER>
                <d:FINAL_GL_ACCOUNT m:type="Edm.Int64">65009000</d:FINAL_GL_ACCOUNT>
                <d:FINAL_COMPANY_CODE m:type="Edm.String">US10_OH</d:FINAL_COMPANY_CODE>
                <d:RECONCILIATION m:type="Edm.String">Successful</d:RECONCILIATION>
            </m:properties>
        </content>
    </entry>
</feed>

我正在尝试获取下面以粗体突出显示的值，它们在条目标签下。

96372537-120

109

65009000

US10_OH

109

europe\bamcguir

65009000

US10_OH

成功

这是我目前拥有的用于获取值的代码。

import urllib2
import xmltodict
import xml.etree.ElementTree as ET
import requests

tree = ET.parse('export.xml')
root = tree.getroot()
for child in root:
    print child.tag, child.attrib
    for child2 in child:
        print child2.tag, child2.attrib
        for child3 in child2:
            print child3.tag, child3.attrib
            for child4 in child3:
                print child4.tag, child4.attrib
                for child5 in child4:
                    print child5.tag, child5.attrib

这是我为 PO_NUMBER 获得的输出的一部分。

{http://schemas.microsoft.com/ado/2007/08/dataservices}PO_NUMBER {'{http://schemas.microsoft.com/ado/2007/08/dataservices/metadata}type': 'Edm.String'}

我无法获得 PO_NUMBER 的值，即 96372537-120。我如何获得这个值，以及上面突出显示的其他值？

Answer 1

在 ElementTree 中，元素的（前导）文本节点在 text 属性上设置。 tag 是 XML 标签的名称（在 Clark 的符号中），attrib 只是 XML 属性（也在 Clark 的符号中）。

所以child5.text会给你需要的信息。

顺便说一句，您可以使用 Clark 的符号 {namespace}tag 和 ElementTree 的常规查询 API 直接访问内容或属性元素，您不必手动迭代所有内容：

tree.iter('{http://schemas.microsoft.com/ado/2007/08/dataservices/metadata}properties')

将为您提供树中所有 "properties" 对象的迭代器，然后您可以迭代每个属性并获得相应的子文本：

for child in property:
    print(child.text)

注意混合内容的一个奇怪之处（当一个元素可以同时具有文本和元素子元素时）：在 ElementTree 文档模型中，只有 第一个子元素 设置在 .text 当它是文本节点时，否则它在前面的元素上设置为 .tail 例如

<foo>
    bar
    <qux/>
    baz
</foo>

将有 foo.text == "bar" 但 "baz" 将设置为 qux.tail。

在 Python 中使用 ElementTree 获取 XML 值

Get XML value using ElementTree in Python

python

xml

elementtree