python 如何知道 xml 中哪个标签没有关闭
python how to know which tag exactly is not closed in xml
我有一个 xml,我验证它是否真的是一个好的格式 xml,如下所示:
try:
self.doc=etree.parse(attributesXMLFilePath)
except IOError:
error_message = "Error: Couldn't find attribute XML file path {0}".format(attributesXMLFilePath)
raise XMLFileNotFoundException(error_message)
except XMLSyntaxError:
error_message = "The file {0} is not a good XML file, recheck please".format(attributesXMLFilePath)
raise NotGoodXMLFormatException(error_message)
如您所见,我捕获了 XMLSyntaxError,这是来自 :
的错误
from lxml.etree import XMLSyntaxError
效果很好,但它只是告诉我文件格式是否不合适 xml。但是,我想问问你们是否有办法知道哪个标签是错误的,因为在我这样做的情况下:
<name>Marco</name1>
我得到了错误,有没有办法知道 name
标签还没有关闭?
更新
在有人给我关于线和位置的想法后,我想到了这个代码:
class XMLFileNotFoundException(GeneralSpiderException):
def __init__(self, message):
super(XMLFileNotFoundException, self).__init__(message, self)
class GeneralSpiderException(Exception):
def __init__(self, message, e):
super(GeneralSpiderException, self).__init__(message+" \nline of Exception = {0}, position of Exception = {1}".format(e.lineno, e.position))
而且我还在这样报错
raise XMLFileNotFoundException(error_message)
我现在收到这个错误
super(GeneralSpiderException, self).__init__(message+" \nline of Exception = {0}, position of Exception = {1}".format(e.lineno, e.position))
exceptions.AttributeError: 'XMLFileNotFoundException' object has no attribute 'lineno'
您可以打印错误的详细信息。例如:
try:
self.doc = etree.parse(attributesXMLFilePath)
except XMLSyntaxError as e:
error_message = "The file {0} is not correct XML, {1}".format(attributesXMLFilePath, e.msg)
raise NotGoodXMLFormatException(error_message)
这可能不是您想要的,但您可以从异常中获取检测到错误的确切行和列:
import lxml.etree
import StringIO
xml_fragment = "<name>Marco</name1>"
# 12345678901234
try:
lxml.etree.parse(StringIO.StringIO(xml_fragment))
except lxml.etree.XMLSyntaxError as exc:
line, column = exc.position
在此示例中,line
和 column
将是 1 和 14,表示没有匹配的开始标记的结束标记的第一个字符。
我有一个 xml,我验证它是否真的是一个好的格式 xml,如下所示:
try:
self.doc=etree.parse(attributesXMLFilePath)
except IOError:
error_message = "Error: Couldn't find attribute XML file path {0}".format(attributesXMLFilePath)
raise XMLFileNotFoundException(error_message)
except XMLSyntaxError:
error_message = "The file {0} is not a good XML file, recheck please".format(attributesXMLFilePath)
raise NotGoodXMLFormatException(error_message)
如您所见,我捕获了 XMLSyntaxError,这是来自 :
的错误from lxml.etree import XMLSyntaxError
效果很好,但它只是告诉我文件格式是否不合适 xml。但是,我想问问你们是否有办法知道哪个标签是错误的,因为在我这样做的情况下:
<name>Marco</name1>
我得到了错误,有没有办法知道 name
标签还没有关闭?
更新
在有人给我关于线和位置的想法后,我想到了这个代码:
class XMLFileNotFoundException(GeneralSpiderException):
def __init__(self, message):
super(XMLFileNotFoundException, self).__init__(message, self)
class GeneralSpiderException(Exception):
def __init__(self, message, e):
super(GeneralSpiderException, self).__init__(message+" \nline of Exception = {0}, position of Exception = {1}".format(e.lineno, e.position))
而且我还在这样报错
raise XMLFileNotFoundException(error_message)
我现在收到这个错误
super(GeneralSpiderException, self).__init__(message+" \nline of Exception = {0}, position of Exception = {1}".format(e.lineno, e.position))
exceptions.AttributeError: 'XMLFileNotFoundException' object has no attribute 'lineno'
您可以打印错误的详细信息。例如:
try:
self.doc = etree.parse(attributesXMLFilePath)
except XMLSyntaxError as e:
error_message = "The file {0} is not correct XML, {1}".format(attributesXMLFilePath, e.msg)
raise NotGoodXMLFormatException(error_message)
这可能不是您想要的,但您可以从异常中获取检测到错误的确切行和列:
import lxml.etree
import StringIO
xml_fragment = "<name>Marco</name1>"
# 12345678901234
try:
lxml.etree.parse(StringIO.StringIO(xml_fragment))
except lxml.etree.XMLSyntaxError as exc:
line, column = exc.position
在此示例中,line
和 column
将是 1 和 14,表示没有匹配的开始标记的结束标记的第一个字符。