什么是 org.xml.sax.SAXException：未识别扫描仪状态 24？

Question

我收到以下异常，但无法找到任何特定于此异常的文档：

org.xml.sax.SAXException: Scanner State 24 not Recognized
       at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:271)
       at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:339)
       at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:121)

任何帮助或指向正确资源的指针都会非常有帮助。

Answer 1

此异常是由 Xerces 解析器的 Sun 版本在 DOM 解析期间产生的。不幸的是，引发异常的特定代码块被隐藏了（以下内容转述自JDK 8.0 DOMParser source code）：

public void parse(InputSource inputSource) throws SAXException, IOException
{
    try
    {
        XMLInputSource xmlInputSource = new XMLInputSource(inputSource.getPublicId(), inputSource.getSystemId(), null);
        xmlInputSource.setByteStream(inputSource.getByteStream());
        xmlInputSource.setCharacterStream(inputSource.getCharacterStream());
        xmlInputSource.setEncoding(inputSource.getEncoding());
        parse(xmlInputSource); <-- Original XNIException is thrown in here
    } catch (XMLParseException e) {
        ...
    } catch (XNIException e) { // <-- wrap XNI exceptions as SAX exceptions
        Exception ex = e.getException();
        if (ex == null) { throw new SAXException(e.getMessage()); }
        if (ex instanceof SAXException) { throw (SAXException) ex; }
        if (ex instanceof IOException) { throw (IOException) ex; }
        throw new SAXException(ex); // <-- Note: The original stack trace is lost here.
    }
}

由于原始堆栈跟踪被遮盖了，实际上只有两种方法可以确定此类解析异常的实际原因：

将调试器附加到 DOMParser.parse() 方法并单步执行代码以查看最初抛出异常的位置。
从您的 XML 文档中删除元素，直到不再出现解析错误，然后以较小的增量将它们添加回去，以确定触发了哪些 element/attribute/processing instruction/etc解析器错误。

如果您对查找错误来源不感兴趣，而只是想使错误消失，您可以尝试使用不同的 XML 解析器（例如，最新版本的 Apache Xerces默认的 Sun Xerces 解析器）。

正如之前的评论者所建议的那样，提供您的 XML 文档的副本以及 Java 的特定版本（例如，JDK 8.0u40b25）可以得到更准确的答案.

什么是 org.xml.sax.SAXException：未识别扫描仪状态 24？

What is org.xml.sax.SAXException: Scanner State 24 not Recognized?

java

xml

sax

exception