从 ElementTree 获取更好的解析错误信息
Get better parse error message from ElementTree
如果我尝试解析损坏的 XML,异常会显示行号。有没有办法显示 XML 上下文?
我想查看损坏部分前后的 xml 个标签。
示例:
import xml.etree.ElementTree as ET
tree = ET.fromstring('<a><b></a>')
异常:
Traceback (most recent call last):
File "tmp/foo.py", line 2, in <module>
tree = ET.fromstring('<a><b></a>')
File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1300, in XML
parser.feed(text)
File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1642, in feed
self._raiseerror(v)
File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1506, in _raiseerror
raise err
xml.etree.ElementTree.ParseError: mismatched tag: line 1, column 8
像这样就好了:
ParseError:
<a><b></a>
=====^
这不是最佳选择,但简单易行,您只需解析 ParseError
提取行和列,然后用它来显示问题所在。
import xml.etree.ElementTree as ET
from xml.etree.ElementTree import ParseError
my_string = '<a><b><c></b></a>'
try:
tree = ET.fromstring(my_string)
except ParseError as e:
formatted_e = str(e)
line = int(formatted_e[formatted_e.find("line ") + 5: formatted_e.find(",")])
column = int(formatted_e[formatted_e.find("column ") + 7:])
split_str = my_string.split("\n")
print "{}\n{}^".format(split_str[line - 1], len(split_str[line - 1][0:column])*"-")
注意:\n
仅用于示例,您需要以正确的方式拆分它。
您可以创建一个辅助函数来执行此操作:
import sys
import io
import itertools as IT
import xml.etree.ElementTree as ET
PY2 = sys.version_info[0] == 2
StringIO = io.BytesIO if PY2 else io.StringIO
def myfromstring(content):
try:
tree = ET.fromstring(content)
except ET.ParseError as err:
lineno, column = err.position
line = next(IT.islice(StringIO(content), lineno))
caret = '{:=>{}}'.format('^', column)
err.msg = '{}\n{}\n{}'.format(err, line, caret)
raise
return tree
myfromstring('<a><b></a>')
产量
xml.etree.ElementTree.ParseError: mismatched tag: line 1, column 8
<a><b></a>
=======^
如果我尝试解析损坏的 XML,异常会显示行号。有没有办法显示 XML 上下文?
我想查看损坏部分前后的 xml 个标签。
示例:
import xml.etree.ElementTree as ET
tree = ET.fromstring('<a><b></a>')
异常:
Traceback (most recent call last):
File "tmp/foo.py", line 2, in <module>
tree = ET.fromstring('<a><b></a>')
File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1300, in XML
parser.feed(text)
File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1642, in feed
self._raiseerror(v)
File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1506, in _raiseerror
raise err
xml.etree.ElementTree.ParseError: mismatched tag: line 1, column 8
像这样就好了:
ParseError:
<a><b></a>
=====^
这不是最佳选择,但简单易行,您只需解析 ParseError
提取行和列,然后用它来显示问题所在。
import xml.etree.ElementTree as ET
from xml.etree.ElementTree import ParseError
my_string = '<a><b><c></b></a>'
try:
tree = ET.fromstring(my_string)
except ParseError as e:
formatted_e = str(e)
line = int(formatted_e[formatted_e.find("line ") + 5: formatted_e.find(",")])
column = int(formatted_e[formatted_e.find("column ") + 7:])
split_str = my_string.split("\n")
print "{}\n{}^".format(split_str[line - 1], len(split_str[line - 1][0:column])*"-")
注意:\n
仅用于示例,您需要以正确的方式拆分它。
您可以创建一个辅助函数来执行此操作:
import sys
import io
import itertools as IT
import xml.etree.ElementTree as ET
PY2 = sys.version_info[0] == 2
StringIO = io.BytesIO if PY2 else io.StringIO
def myfromstring(content):
try:
tree = ET.fromstring(content)
except ET.ParseError as err:
lineno, column = err.position
line = next(IT.islice(StringIO(content), lineno))
caret = '{:=>{}}'.format('^', column)
err.msg = '{}\n{}\n{}'.format(err, line, caret)
raise
return tree
myfromstring('<a><b></a>')
产量
xml.etree.ElementTree.ParseError: mismatched tag: line 1, column 8
<a><b></a>
=======^