元素树查找输出空文本
Element Tree find output empty text
我在使用 Element Tree 提取文本时遇到问题。
我的 xml 文件格式是
<elecs id = 'elecs'>
<elec id = "CLM-0001" num = "0001">
<elec-text> blah blah blah </elec-text>
<elec-text> blah blah blah </elec-text>
</elec>
<elec id = "CLM-0002" num = "0002">
<elec-text> blah blah blah </elec-text>
<elec-text> blah blah blah </elec-text>
</elec>
</elecs>
我想提取标签内的所有文本
假设我们的xml文件在变量xml
中
import xml.etree.ElementTree as ET
import lxml import etree
parser = etree.XMLParser(recover = True)
contents = open(xml).read()
tree = ET.fromstring(contents, parser = parser)
elecsN = tree.find('elecs')
for element in elecsN:
print element.text
问题是,上面的代码 returns 空字符串。我已经用我文档中的其他标签尝试了上面的代码并且它有效。我不知道为什么这次 returns 空字符串。
有没有办法解决这个问题
非常感谢
如果你的意思是 'any way' 你可以使用 lxml。
>>> from io import StringIO
>>> html = StringIO('''\
... <elecs id = 'elecs'>
... <elec id = "CLM-0001" num = "0001">
... <elec-text> blah blah blah </elec-text>
... <elec-text> blah blah blah </elec-text>
... </elec>
... <elec id = "CLM-0002" num = "0002">
... <elec-text> blah blah blah </elec-text>
... <elec-text> blah blah blah </elec-text>
... </elec>
... </elecs>
... '''
... )
>>> from lxml import etree
>>> doc = etree.parse(html)
>>> doc.xpath('//elecs/elec/*/text()')
[' blah blah blah ', ' blah blah blah ', ' blah blah blah ', ' blah blah blah ']
您可以简单地按名称查找直接包含文本的元素,即 elec-text
在这种情况下:
>>> elec_texts = tree.findall('.//elec-text')
>>> for elec_text in elec_texts:
... print elec_text.text
...
blah blah blah
blah blah blah
blah blah blah
blah blah blah
我在使用 Element Tree 提取文本时遇到问题。
我的 xml 文件格式是
<elecs id = 'elecs'>
<elec id = "CLM-0001" num = "0001">
<elec-text> blah blah blah </elec-text>
<elec-text> blah blah blah </elec-text>
</elec>
<elec id = "CLM-0002" num = "0002">
<elec-text> blah blah blah </elec-text>
<elec-text> blah blah blah </elec-text>
</elec>
</elecs>
我想提取标签内的所有文本
假设我们的xml文件在变量xml
中import xml.etree.ElementTree as ET
import lxml import etree
parser = etree.XMLParser(recover = True)
contents = open(xml).read()
tree = ET.fromstring(contents, parser = parser)
elecsN = tree.find('elecs')
for element in elecsN:
print element.text
问题是,上面的代码 returns 空字符串。我已经用我文档中的其他标签尝试了上面的代码并且它有效。我不知道为什么这次 returns 空字符串。
有没有办法解决这个问题
非常感谢
如果你的意思是 'any way' 你可以使用 lxml。
>>> from io import StringIO
>>> html = StringIO('''\
... <elecs id = 'elecs'>
... <elec id = "CLM-0001" num = "0001">
... <elec-text> blah blah blah </elec-text>
... <elec-text> blah blah blah </elec-text>
... </elec>
... <elec id = "CLM-0002" num = "0002">
... <elec-text> blah blah blah </elec-text>
... <elec-text> blah blah blah </elec-text>
... </elec>
... </elecs>
... '''
... )
>>> from lxml import etree
>>> doc = etree.parse(html)
>>> doc.xpath('//elecs/elec/*/text()')
[' blah blah blah ', ' blah blah blah ', ' blah blah blah ', ' blah blah blah ']
您可以简单地按名称查找直接包含文本的元素,即 elec-text
在这种情况下:
>>> elec_texts = tree.findall('.//elec-text')
>>> for elec_text in elec_texts:
... print elec_text.text
...
blah blah blah
blah blah blah
blah blah blah
blah blah blah