使用 ElementTree 时如何访问 XML 节点中的数据

Question

我正在解析位于 link 的 XML:

我需要访问节点内的数据，我编写的程序似乎告诉我节点内没有任何数据。这是我的代码：

import urllib
import xml.etree.ElementTree as ET 

#prompt for link where xml data resides
#Use this link for testing: http://python-data.dr-chuck.net/comments_42.xml
url = raw_input('Enter URL Link: ')

#open url and prep for parsing
data = urllib.urlopen(url).read()

#read url data and convert to XML Node Tree for parsing
comments = ET.fromstring(data)

#the comment below is part of another approach to the solution
#both approaches are leading me into the same direction
#it appears as if the data inside the node is not being parsed/extracted
#counts = comments.findall('comments/comment/count')

for count in comments.findall('count'):
    print comments.find('count').text

当我单独打印出 'data' 变量时，我得到了完整的 XML 树。但是，当我尝试访问特定节点内的数据时，该节点返回为空。

我还尝试打印以下代码以查看我会返回哪些数据：

for child in comments:
    print child.tag, child.attrib

我得到的输出是：

note {} comments {}

我做错了什么，我错过了什么？

我在尝试访问节点的不同循环策略时遇到的错误之一是：

Traceback (most recent call last):
  File "xmlextractor.py", line 16, in <module>
    print comments.find('count').text
AttributeError: 'NoneType' object has no attribute 'text'

请帮忙谢谢！！！

更新：

我在查看 etree 文档时意识到 python 我的方法一直在尝试 'get' 节点属性而不是节点的内容。我还没有找到答案，但我肯定更接近了！！！

第二次更新：

所以我尝试了这段代码：

import urllib
import xml.etree.ElementTree as ET 

#prompt for link where xml data resides
#Use this link for testing: http://python-data.dr-chuck.net/comments_42.xml

url = raw_input('Enter URL Link: ')

#open url and prep for parsing
data = urllib.urlopen(url).read()

#read url data and convert to XML Node Tree for parsing
comments = ET.fromstring(data)

counts = comments.findall('comments/comment/count')

print len(counts)

for count in counts:
    print 'count', count.find('count').text

从上面，当我运行这个代码我的：

print len(counts)

我的计数列表中有 50 个节点的输出，但我仍然得到相同的错误：

Traceback (most recent call last):
  File "xmlextractor.py", line 18, in <module>
    print 'count', count.find('count').text
AttributeError: 'NoneType' object has no attribute 'text'

我不明白为什么当我试图访问节点的内容时它说没有 'text' 属性。

我做错了什么？？

Answer 1

对您的方法的几点评论：

for count in comments.findall('count'):
    print comments.find('count').text

comments.findall('count') returns 一个空列表，因为 comments 没有名称为 count.

的任何直接子元素

for child in comments:
    print child.tag, child.attrib

迭代根节点的直接子元素，称为 note。

# From update #2
for count in comments.findall('comments/comment/count'):
    print 'count', count.find('count').text

这里，count是一个Element对象，表示一个count节点，它本身不包含任何count节点。因此，count.find('count') returns 一个 NoneType 对象。

如果我没理解错的话，您的目标是检索 count 节点的文本值。这里有两种方法可以实现：

for count in comments.findall('comments/comment/count'):
    print count.text

for comment in comments.iter('comment'):
    print comment.find('count').text

使用 ElementTree 时如何访问 XML 节点中的数据

How can I access the data in an XML node when using ElementTree

python

xml

string-parsing

xml-parsing