解析命名空间 XML 的问题。显示为空。在 Python

Question

我的 XML 文件可用 here。虽然我能够从此文件中获取根节点及其子节点。但是，我无法得到我需要的那个。我想获取<ce:section-title>Methods</ce:section-title>的内容我已经尝试了 xml 和 lxml 包。

当我使用以下内容时，

 tree = lxml.etree.parse(fname) #fname is xml filename
 root= tree.getroot()

print(root[5].findall("ce:section-title",root.nsmap)

它只是给了我空 [] 括号。当我使用以下命令时，它给出了相同的空括号：

for item in tree.iter('{http://www.elsevier.com/xml/ja/dtd}ce:section-title'):
    print(item)

我确实尝试使用提供的解决方案 here 来解决问题，但我在这段代码中遇到以下错误：

ns = {"ce":"http://www.elsevier.com/xml/common/dtd"}
print(root.findall("ce:title", ns).text)

AttributeError: 'NoneType' object has no attribute 'text'

任何方向都会有帮助

Answer 1

它应该适用于 findall(.//ce:section-title, root.nsmap)。

前面加上 .//，您正在搜索上下文节点下所有级别的 section-title 个后代。使用findall(ce:section-title, root.nsmap)，只能定位直接子元素。

示例：

from lxml import etree

tree = etree.parse("data.xml")  # Your XML
root = tree.getroot()

for e in root.findall(".//ce:section-title", root.nsmap):
    print(e.text)

输出：

Abstract
Keywords
Introduction
Materials and methods
Results
The appearing species by taxon
List of regional appearing species
Discussion
Acknowledgments
References

解析命名空间 XML 的问题。显示为空。在 Python

problems with parsing namespace XML. Showing null. in Python

python

lxml

xml-namespaces

xml-parsing