仅解析和计算数字 xml 文本,包括 e-00 或 e+01
Parse and count numeric only xml text including e-00 or e+01
我是python新手。我正在尝试解析一个 xml 文件,并使用 e- 或 e+ 计算所有数字文本输入,包括近似值。例如。鉴于下面的伪代码 (jerry.xml),
<data>
<country name="Liechtenstein">
<rank updated="yes">2</rank>
<language>english</language>
<currency>1.21$/kg</currency>
<gdppc>141100</gdppc>
<gdpnp>2.304e+0150</gdpnp>
<neighbor name="Austria" direction="E"/>
<neighbor name="Switzerland" direction="W"/>
</country>
<country name="Singapore">
<rank updated="yes">5</rank>
<language>english</language>
<currency>4.1$/kg</currency>
<gdppc>59900</gdppc>
<gdpnp>5.2e-015</gdpnp>
<neighbor name="Malaysia" direction="N"/>
</country>
我想return6,数过2, 141100, 2.304e+0150, 5, 59900 和 5.2e-015 省略英文,1.21$/kg or 4.1$/kg.
如有任何帮助,我们将不胜感激。现在我有以下内容。
import xml.etree.ElementTree as ET
tree = ET.parse("jerry.xml")
root = tree.getroot()
for text in root.itertext():
print repr(text)
charlie = file.writelines(root.itertext())
count = sum(element.firstChild.nodeValue.find(r'\d+$'') for element in xmldoc.getElementsByTagName('jerry.xml'))
您可以简单地尝试将每个内部文本元素转换为浮点数,并忽略任何错误。
import xml.etree.ElementTree as ET
tree = ET.parse("temp.txt")
root = tree.getroot()
nums = []
for e in root.itertext():
try:
nums.append(float(e))
except ValueError:
pass
print nums
print len(nums)
根据要求,一种可能低效但成功的跟踪元素位置的方法:
def extractNumbers(path, node):
nums = []
path += '/' + node.tag
if 'name' in node.keys():
path += '=' + node.attrib['name']
try:
num = float(node.text)
nums.append( (path, num) )
except (ValueError, TypeError):
pass
for e in list(node):
nums.extend( extractNumbers(path, e) )
return nums
tree = ET.parse('temp.txt')
nums = extractNumbers('', tree.getroot())
print len(nums)
print nums
for n in nums:
print n[0], n[1]
我是python新手。我正在尝试解析一个 xml 文件,并使用 e- 或 e+ 计算所有数字文本输入,包括近似值。例如。鉴于下面的伪代码 (jerry.xml),
<data>
<country name="Liechtenstein">
<rank updated="yes">2</rank>
<language>english</language>
<currency>1.21$/kg</currency>
<gdppc>141100</gdppc>
<gdpnp>2.304e+0150</gdpnp>
<neighbor name="Austria" direction="E"/>
<neighbor name="Switzerland" direction="W"/>
</country>
<country name="Singapore">
<rank updated="yes">5</rank>
<language>english</language>
<currency>4.1$/kg</currency>
<gdppc>59900</gdppc>
<gdpnp>5.2e-015</gdpnp>
<neighbor name="Malaysia" direction="N"/>
</country>
我想return6,数过2, 141100, 2.304e+0150, 5, 59900 和 5.2e-015 省略英文,1.21$/kg or 4.1$/kg.
如有任何帮助,我们将不胜感激。现在我有以下内容。
import xml.etree.ElementTree as ET
tree = ET.parse("jerry.xml")
root = tree.getroot()
for text in root.itertext():
print repr(text)
charlie = file.writelines(root.itertext())
count = sum(element.firstChild.nodeValue.find(r'\d+$'') for element in xmldoc.getElementsByTagName('jerry.xml'))
您可以简单地尝试将每个内部文本元素转换为浮点数,并忽略任何错误。
import xml.etree.ElementTree as ET
tree = ET.parse("temp.txt")
root = tree.getroot()
nums = []
for e in root.itertext():
try:
nums.append(float(e))
except ValueError:
pass
print nums
print len(nums)
根据要求,一种可能低效但成功的跟踪元素位置的方法:
def extractNumbers(path, node):
nums = []
path += '/' + node.tag
if 'name' in node.keys():
path += '=' + node.attrib['name']
try:
num = float(node.text)
nums.append( (path, num) )
except (ValueError, TypeError):
pass
for e in list(node):
nums.extend( extractNumbers(path, e) )
return nums
tree = ET.parse('temp.txt')
nums = extractNumbers('', tree.getroot())
print len(nums)
print nums
for n in nums:
print n[0], n[1]