通过 xml.etree 读取 XML 文件时出错
Error in XML file reading via xml.etree
我正在尝试使用 xml.etree 读取 python 中的 XML 文件,但有时对于某些文件,我在解析文件时遇到内存错误。我的 XML 文件大小是 912Mb,这个问题与文件大小有关吗?
代码:
from xml.etree import ElementTree
with open('F:\Reports\Logs\AppPerfect_States\TG1_GM\Result_TG1_V16.xml', 'rt') as f1:
tree = ElementTree.parse(f1)
错误:
Traceback (most recent call last):
File "<pyshell#3>", line 2, in <module>
tree = ElementTree.parse(f1)
File "C:\Python27\lib\xml\etree\ElementTree.py", line 1182, in parse
tree.parse(source, parser)
File "C:\Python27\lib\xml\etree\ElementTree.py", line 653, in parse
data = source.read(65536)
MemoryError
更新:
根据许多建议,我尝试了 lxml
代码:
from lxml import etree
context = etree.iterparse('F:\Reports\Logs\AppPerfect_States\TG1_GM\Result_TG1_V16.xml',tag = "document")
for event, element in context:
for child in element:
print child.tag, child.text
element.clear()
错误:
C:\Python27\python.exe "F:/Py Projects/V16_AUTO/test1/xmlparsingtest1.py"
Traceback (most recent call last):
File "F:/Py Projects/V16_AUTO/test1/xmlparsingtest1.py", line 3, in <module>
for event, element in context:
File "iterparse.pxi", line 207, in lxml.etree.iterparse.__next__ (src\lxml\lxml.etree.c:126137)
lxml.etree.XMLSyntaxError: unknown error, line 7530730, column 33
更新2:
尝试过 cElementTree
代码:
import xml.etree.cElementTree as etree
xmL = 'F:\Reports\Logs\Result_TG1_V16.xml'
context = etree.iterparse(xmL, events=("start", "end"))
context = iter(context)
event, root = context.next()
for event, elem in context:
if event == 'TasksReportNode':
print elem.tag
print elem.text
root.clear()
错误:
Exception MemoryError: in ignored
Exception MemoryError: in ignored
Exception MemoryError: in ignored
Exception MemoryError: in ignored
Exception MemoryError: in ignored
MemoryError
import xml.etree.ElementTree as ET
tree = ET.ElementTree(file="xyz.xml")
for elem in tree.iter():
print elem.attrib
尝试使用此代码读取您的文件。可能会有帮助。
这是我试过的:我用过 lxml
from lxml import etree
xmL = 'F:\Reports\Logs\Result_TG1_V16.xml'
context = etree.iterparse(xmL, events=("start", "end"),)
for event, element in context:
if element.tag == 'TasksReportNode':
for child1 in element:
for child2 in child1:
if child2.get("RowCount") == "0":
for child3 in child2:
print(child3.tag, child3.text)
element.clear() # discard the element
del context
我能够解析所有标签并检索所需数据。
我正在尝试使用 xml.etree 读取 python 中的 XML 文件,但有时对于某些文件,我在解析文件时遇到内存错误。我的 XML 文件大小是 912Mb,这个问题与文件大小有关吗?
代码:
from xml.etree import ElementTree
with open('F:\Reports\Logs\AppPerfect_States\TG1_GM\Result_TG1_V16.xml', 'rt') as f1:
tree = ElementTree.parse(f1)
错误:
Traceback (most recent call last):
File "<pyshell#3>", line 2, in <module>
tree = ElementTree.parse(f1)
File "C:\Python27\lib\xml\etree\ElementTree.py", line 1182, in parse
tree.parse(source, parser)
File "C:\Python27\lib\xml\etree\ElementTree.py", line 653, in parse
data = source.read(65536)
MemoryError
更新: 根据许多建议,我尝试了 lxml
代码:
from lxml import etree
context = etree.iterparse('F:\Reports\Logs\AppPerfect_States\TG1_GM\Result_TG1_V16.xml',tag = "document")
for event, element in context:
for child in element:
print child.tag, child.text
element.clear()
错误:
C:\Python27\python.exe "F:/Py Projects/V16_AUTO/test1/xmlparsingtest1.py"
Traceback (most recent call last):
File "F:/Py Projects/V16_AUTO/test1/xmlparsingtest1.py", line 3, in <module>
for event, element in context:
File "iterparse.pxi", line 207, in lxml.etree.iterparse.__next__ (src\lxml\lxml.etree.c:126137)
lxml.etree.XMLSyntaxError: unknown error, line 7530730, column 33
更新2: 尝试过 cElementTree
代码:
import xml.etree.cElementTree as etree
xmL = 'F:\Reports\Logs\Result_TG1_V16.xml'
context = etree.iterparse(xmL, events=("start", "end"))
context = iter(context)
event, root = context.next()
for event, elem in context:
if event == 'TasksReportNode':
print elem.tag
print elem.text
root.clear()
错误:
Exception MemoryError: in ignored
Exception MemoryError: in ignored
Exception MemoryError: in ignored
Exception MemoryError: in ignored
Exception MemoryError: in ignored
MemoryError
import xml.etree.ElementTree as ET
tree = ET.ElementTree(file="xyz.xml")
for elem in tree.iter():
print elem.attrib
尝试使用此代码读取您的文件。可能会有帮助。
这是我试过的:我用过 lxml
from lxml import etree
xmL = 'F:\Reports\Logs\Result_TG1_V16.xml'
context = etree.iterparse(xmL, events=("start", "end"),)
for event, element in context:
if element.tag == 'TasksReportNode':
for child1 in element:
for child2 in child1:
if child2.get("RowCount") == "0":
for child3 in child2:
print(child3.tag, child3.text)
element.clear() # discard the element
del context
我能够解析所有标签并检索所需数据。