Python SAX 解析器:resolveEntity

Python SAX Parser: resolveEntity

我很难弄清楚如何将我自己的 ResolveEntityHandler 绑定到 SAX 解析器。在 SO 上有 this answer。但不幸的是我无法在那里重现结果。

当我运行下面的代码时,实际上是从上述答案中复制过来的,只是更新为Python 3,

import io
import xml.sax
from xml.sax.handler import ContentHandler

# Inheriting from EntityResolver and DTDHandler is not necessary
class TestHandler(ContentHandler):

    # This method is only called for external entities. Must return a value.
    def resolveEntity(self, publicID, systemID):
        print ("TestHandler.resolveEntity(): %s %s" % (publicID, systemID))
        return systemID

    def skippedEntity(self, name):
        print ("TestHandler.skippedEntity(): %s" % (name))

    def unparsedEntityDecl(self, name, publicID, systemID, ndata):
        print ("TestHandler.unparsedEntityDecl(): %s %s" % (publicID, systemID))

    def startElement(self, name, attrs):
        summary = attrs.get('summary', '')
        print ('TestHandler.startElement():', summary)

def main(xml_string):
    try:
        parser = xml.sax.make_parser()
        curHandler = TestHandler()
        parser.setContentHandler(curHandler)
        parser.setEntityResolver(curHandler)
        parser.setDTDHandler(curHandler)

        stream = io.StringIO(xml_string)
        parser.parse(stream)
        stream.close()
    except xml.sax.SAXParseException as e:
        print ("ERROR %s" % e)

XML = """<!DOCTYPE test SYSTEM "test.dtd">
<test summary='step: &num;'>Entity: &not;</test>
"""

main(XML)

和外部test.dtd

<!ENTITY num "FOO">
<!ENTITY pic SYSTEM 'bar.gif' NDATA gif>

我得到的是

TestHandler.startElement(): step: 
TestHandler.skippedEntity(): not

Process finished with exit code 0

所以我的问题是:

  1. 为什么 resolveEntity 从未被调用过?
  2. 如何将 ResolveEntityHandler 绑定到您的解析器?

您所看到的与change in Python 3.7.1有关:

Changed in version 3.7.1: The SAX parser no longer processes general external entities by default to increase security. Before, the parser created network connections to fetch remote files or loaded local files from the file system for DTD and entities. The feature can be enabled again with method setFeature() on the parser object and argument feature_external_ges.

要获得与早期版本相同的行为,请添加以下行:

from xml.sax.handler import feature_external_ges

和(在main函数中)

parser.setFeature(feature_external_ges, True)