使用 XPATH 读取棘手 XML
Reading tricky XML with XPATH
我是 Python 和 XPATH 的初学者,需要使用 XPATH 读取具有非统一节点(类似于下面提到的节点)的 XML。写入文件的输出格式也如下所示。代码使用 lxml 库。
请帮助我构建正确的 XPATH。
来源XML
<Classes>
<German>
<Student>
<Span><a href="">John</a></Span>
</Student>
<Student>
<Span>Adam</Span>
</Student>
</German>
<English>
<Student>
<Span>Mary</Span>
</Student>
</English>
<French>
<Student>
<Span><a href="">Anil</a></Span>
</Student>
<Student>
<Span><a href="">Jack</a></Span>
</Student>
</French>
<Spanish>
<Student>
<Span>Mary</Span>
</Student>
<Student>
<Span>Jack</Span>
</Student>
</Spanish>
</Classes>
预期输出
German
John
Adam
English
Mary
French
Anil
Jack
Spanish
Mary
Jack
谢谢,
尼克尔
这段代码会有所帮助:
from lxml import html
xml_content = """<Classes>
<German>
<Student>
<Span><a href="">John</a></Span>
</Student>
<Student>
<Span>Adam</Span>
</Student>
</German>
<English>
<Student>
<Span>Mary</Span>
</Student>
</English>
<French>
<Student>
<Span><a href="">Anil</a></Span>
</Student>
<Student>
<Span><a href="">Jack</a></Span>
</Student>
</French>
<Spanish>
<Student>
<Span>Mary</Span>
</Student>
<Student>
<Span>Jack</Span>
</Student>
</Spanish>
</Classes>"""
tree = html.fromstring(xml_content)
classes = tree.xpath('//classes/*')
for language_class in classes:
print language_class.tag.capitalize()
for student in language_class.xpath('.//student/span//text()'):
print " {}".format(student)
输出:
German
John
Adam
English
Mary
French
Anil
Jack
Spanish
Mary
Jack
我是 Python 和 XPATH 的初学者,需要使用 XPATH 读取具有非统一节点(类似于下面提到的节点)的 XML。写入文件的输出格式也如下所示。代码使用 lxml 库。
请帮助我构建正确的 XPATH。
来源XML
<Classes>
<German>
<Student>
<Span><a href="">John</a></Span>
</Student>
<Student>
<Span>Adam</Span>
</Student>
</German>
<English>
<Student>
<Span>Mary</Span>
</Student>
</English>
<French>
<Student>
<Span><a href="">Anil</a></Span>
</Student>
<Student>
<Span><a href="">Jack</a></Span>
</Student>
</French>
<Spanish>
<Student>
<Span>Mary</Span>
</Student>
<Student>
<Span>Jack</Span>
</Student>
</Spanish>
</Classes>
预期输出
German
John
Adam
English
Mary
French
Anil
Jack
Spanish
Mary
Jack
谢谢, 尼克尔
这段代码会有所帮助:
from lxml import html
xml_content = """<Classes>
<German>
<Student>
<Span><a href="">John</a></Span>
</Student>
<Student>
<Span>Adam</Span>
</Student>
</German>
<English>
<Student>
<Span>Mary</Span>
</Student>
</English>
<French>
<Student>
<Span><a href="">Anil</a></Span>
</Student>
<Student>
<Span><a href="">Jack</a></Span>
</Student>
</French>
<Spanish>
<Student>
<Span>Mary</Span>
</Student>
<Student>
<Span>Jack</Span>
</Student>
</Spanish>
</Classes>"""
tree = html.fromstring(xml_content)
classes = tree.xpath('//classes/*')
for language_class in classes:
print language_class.tag.capitalize()
for student in language_class.xpath('.//student/span//text()'):
print " {}".format(student)
输出:
German
John
Adam
English
Mary
French
Anil
Jack
Spanish
Mary
Jack