python 如何知道 xml 中哪个标签没有关闭

python how to know which tag exactly is not closed in xml

我有一个 xml,我验证它是否真的是一个好的格式 xml,如下所示:

try:
            self.doc=etree.parse(attributesXMLFilePath)
        except IOError:
            error_message = "Error: Couldn't find attribute XML file path {0}".format(attributesXMLFilePath)
            raise XMLFileNotFoundException(error_message)
        except XMLSyntaxError:
            error_message = "The file {0} is not a good XML file, recheck please".format(attributesXMLFilePath)
            raise NotGoodXMLFormatException(error_message)

如您所见,我捕获了 XMLSyntaxError,这是来自 :

的错误

from lxml.etree import XMLSyntaxError

效果很好,但它只是告诉我文件格式是否不合适 xml。但是,我想问问你们是否有办法知道哪个标签是错误的,因为在我这样做的情况下:

<name>Marco</name1>

我得到了错误,有没有办法知道 name 标签还没有关闭?

更新

在有人给我关于线和位置的想法后,我想到了这个代码:

    class XMLFileNotFoundException(GeneralSpiderException):
        def __init__(self, message):
            super(XMLFileNotFoundException, self).__init__(message, self)

class GeneralSpiderException(Exception):
    def __init__(self, message, e):
        super(GeneralSpiderException, self).__init__(message+" \nline of Exception = {0}, position of Exception = {1}".format(e.lineno, e.position))

而且我还在这样报错

raise XMLFileNotFoundException(error_message)

我现在收到这个错误

    super(GeneralSpiderException, self).__init__(message+" \nline of Exception = {0}, position of Exception = {1}".format(e.lineno, e.position))
exceptions.AttributeError: 'XMLFileNotFoundException' object has no attribute 'lineno'

您可以打印错误的详细信息。例如:

try:
    self.doc = etree.parse(attributesXMLFilePath)
except XMLSyntaxError as e:
    error_message = "The file {0} is not correct XML, {1}".format(attributesXMLFilePath, e.msg)
    raise NotGoodXMLFormatException(error_message)

这可能不是您想要的,但您可以从异常中获取检测到错误的确切行和列:

import lxml.etree
import StringIO
xml_fragment = "<name>Marco</name1>"
#               12345678901234
try:
    lxml.etree.parse(StringIO.StringIO(xml_fragment))
except lxml.etree.XMLSyntaxError as exc:
    line, column = exc.position

在此示例中,linecolumn 将是 1 和 14,表示没有匹配的开始标记的结束标记的第一个字符。