为什么在我打印 nodeName 时会弹出无关的文本?
Why does an extraneous text pop up while I print the nodeName?
假设,我有以下 XML 文件:
<?xml version="1.0" encoding="utf-8"?>
<library attrib1="att11" attrib2="att22">
library-text
<book isbn="1111111111">
<title lang="en">T1 T1 T1 T1 T1</title>
<date>2001</date>
<author>A1 A1 A1 A1 A1</author>
<price>10.00</price>
</book>
<book isbn="2222222222">
<title lang="en">T2 T2 T2 T2 T2</title>
<date>2002</date>
<author>A2 A2 A2 A2 A2</author>
<price>20.00</price>
</book>
<book isbn="3333333333">
<title lang="en">T3 T3 T3 T3</title>
<date>2003</date>
<author>A3 A3 A3 A3 A3y</author>
<price>30.00</price>
</book>
</library>
main.py
import xml.dom.minidom as minidom
xml_fname = "library.xml"
dom = minidom.parse(xml_fname)
for node in dom.firstChild.childNodes:
print(node.nodeName)
输出
#text
book
#text
book
#text
book
#text
为什么输出显示#text
?它来自哪里?
如果将 print(node.nodeName)
更改为 print(node)
,您将看到输出
<DOM Text node "'\n libra'...">
<DOM Element: book at 0x11f48ec8>
<DOM Text node "'\n '">
<DOM Element: book at 0x11f50070>
<DOM Text node "'\n '">
<DOM Element: book at 0x11f501d8>
<DOM Text node "'\n'">
minidom
将“自由文本”“节点”视为名称为 #text
.
的实际无名 DOM 文本节点
如果您只想要 book
个节点,请明确说明:
for node in dom.getElementsByTagName('book'):
print(node.nodeName)
产出
book
book
book
请记住,不鼓励使用 minidom
。来自 official Python docs:
Users who are not already proficient with the DOM should consider using the xml.etree.ElementTree
module for their XML processing instead.
考虑使用 ElementTree
:
import xml.etree.ElementTree as ET
xml_fname = "library.xml"
root = ET.parse(xml_fname)
for node in root.findall('book'):
print(node.tag)
也输出
book
book
book
假设,我有以下 XML 文件:
<?xml version="1.0" encoding="utf-8"?>
<library attrib1="att11" attrib2="att22">
library-text
<book isbn="1111111111">
<title lang="en">T1 T1 T1 T1 T1</title>
<date>2001</date>
<author>A1 A1 A1 A1 A1</author>
<price>10.00</price>
</book>
<book isbn="2222222222">
<title lang="en">T2 T2 T2 T2 T2</title>
<date>2002</date>
<author>A2 A2 A2 A2 A2</author>
<price>20.00</price>
</book>
<book isbn="3333333333">
<title lang="en">T3 T3 T3 T3</title>
<date>2003</date>
<author>A3 A3 A3 A3 A3y</author>
<price>30.00</price>
</book>
</library>
main.py
import xml.dom.minidom as minidom
xml_fname = "library.xml"
dom = minidom.parse(xml_fname)
for node in dom.firstChild.childNodes:
print(node.nodeName)
输出
#text
book
#text
book
#text
book
#text
为什么输出显示#text
?它来自哪里?
如果将 print(node.nodeName)
更改为 print(node)
,您将看到输出
<DOM Text node "'\n libra'...">
<DOM Element: book at 0x11f48ec8>
<DOM Text node "'\n '">
<DOM Element: book at 0x11f50070>
<DOM Text node "'\n '">
<DOM Element: book at 0x11f501d8>
<DOM Text node "'\n'">
minidom
将“自由文本”“节点”视为名称为 #text
.
如果您只想要 book
个节点,请明确说明:
for node in dom.getElementsByTagName('book'):
print(node.nodeName)
产出
book
book
book
请记住,不鼓励使用 minidom
。来自 official Python docs:
Users who are not already proficient with the DOM should consider using the
xml.etree.ElementTree
module for their XML processing instead.
考虑使用 ElementTree
:
import xml.etree.ElementTree as ET
xml_fname = "library.xml"
root = ET.parse(xml_fname)
for node in root.findall('book'):
print(node.tag)
也输出
book
book
book