使用 python 解析 XML 中的同名标签

Question

可能有一个简单的解决方案，但我就是找不到，所以如果有人能帮助我，我会很高兴。我用 python3.7 我正在尝试解析 "date"，但仅限于 application-reference

                <bibliographic-data>
                <publication-reference>
                    <document-id document-id-type="docdb">
                        <country>EP</country>
                        <doc-number>1001100</doc-number>
                        <kind>A1</kind>
                        <date>20000517</date>
                    </document-id>
                    <document-id document-id-type="epodoc">
                        <doc-number>EP1000000</doc-number>
                        <date>20000517</date>
                    </document-id>

      <application-reference doc-id="17397285">
                    <document-id document-id-type="docdb">
                        <country>EP</country>
                        <doc-number>99203729</doc-number>
                        <kind>A</kind>
                    </document-id>
                    <document-id document-id-type="epodoc">
                        <doc-number>EP199903729</doc-number>
                        <date>19991108</date>
                    </document-id>
                    <document-id document-id-type="original">
                        <doc-number>993729</doc-number>
                    </document-id>
                </application-reference>

有可能许多其他日期出现在 application-reference 的前面，所以我不能简单地打印第 4 个日期。

我尝试了简单的 xmldom 或 xml.etree 查询，但 none 有效

由于我不确定如何访问它，我尝试了

root = ElementTree.fromstring(js).getroot()

for appl in root.findall("application-reference"):
    ElementTree.dump(appl)

然后我卡住了

结果应该是19991108。

Answer 1

尝试使用 lxml 和 xpath：

bibli = """[your xml above; make sure it's properly formatted!]"""

from lxml import etree

root = lxml.etree.fromstring(bibli)
print(root.xpath('//application-reference//date/text()'))

输出：

['19991108']

使用 python 解析 XML 中的同名标签

Parsing same-named tags in XML using python

xml

xmldom

xml-parsing

python-3.x