Xerces-C++内存

Xerces-C++ memory

我无法理解 Xerces-C++ 内存管理。

如果我有这个(示例)XML 文件 "config.xml":

<?xml version="1.0" encoding="UTF-8"?>
<settings>
    <port>
        <reference>Ref1</reference>
        <label>1PPS A</label>
        <enabled>true</enabled>
    </port>
</settings>

和此代码:

#include <xercesc/dom/DOM.hpp>

XERCES_CPP_NAMESPACE_USE

DOMElement *nextChildElement(const DOMElement *parent)
{
    DOMNode *node = (DOMNode *)parent->getFirstChild();
    while (node)
    {
        if (node->getNodeType() == DOMNode::ELEMENT_NODE)
            return (DOMElement *)node;
        node = node->getNextSibling();
    }
    return nullptr;
}

int main(int argc, char **argv)
{
    XMLPlatformUtils::Initialize();

    XMLCh tempStr[100];
    XMLString::transcode("LS", tempStr, 99);
    DOMImplementation *impl = DOMImplementationRegistry::getDOMImplementation(tempStr);
    DOMLSParser *parser = ((DOMImplementationLS*)impl)->createLSParser(DOMImplementationLS::MODE_SYNCHRONOUS, 0);
    DOMDocument *doc = impl->createDocument(0, 0, 0);

    doc = parser->parseURI("config.xml");

    DOMElement *el = doc->getDocumentElement(); // <settings>
    el = nextChildElement(el);                  //   <port>
    el = nextChildElement(el);                  //     <reference>Ref1</reference>

    // Heap blows up here
    while (1) {
        char *cstr = XMLString::transcode(el->getTextContent());
        XMLString::release(&cstr); // cstr is "Ref1"
    }

    // and/or here
    while (1) {
        XMLCh *xstr = XMLString::replicate(el->getTextContent());
        char *cstr = XMLString::transcode(xstr); // cstr is "Ref1"
        XMLString::release(&cstr);
        XMLString::release(&xstr);
    }
}

为什么程序(堆)内存在 while (1) 循环中爆炸。任一循环都会导致相同的内存问题:

注意:我使用的是 Visual Studio 2017,我已经在这些配置中对此进行了测试(所有结果相同):

问题是函数 const XMLCh *getTextConent() 在 Document 的堆上分配内存(使用其 MemoryManager),并且没有允许调用者释放内存或将其标记为回收的规定。因此,一旦返回的指针从调用者的堆栈中移除,内存基本上是孤立的,直到整个文档被释放,此时 MemoryManager 删除所有堆分配。

解决方案是不使用 getTextContent(),而是使用 getNodeValue(),returns 指向数据的指针,而不是将其从内部堆中重新分配。

this (non)-bug report

That aside, getTextContent does not work anyway. It's buggy as all get out and is effectively useless. You can't read the DOM that way or you'll get inaccurate data back under a variety of different circumstances if there are non-adjacent Text nodes (and if there aren't, you don't need to use it anyway since the direct node value will be all you need).

因此,OP 示例代码的工作版本可能如下所示:

#include <xercesc/dom/DOM.hpp>
#include <string>

XERCES_CPP_NAMESPACE_USE

DOMElement *nextChildElement(const DOMElement *parent)
{
    DOMNode *node = (DOMNode *)parent->getFirstChild();
    while (node)
    {
        if (node->getNodeType() == DOMNode::ELEMENT_NODE)
            return (DOMElement *)node;
        node = node->getNextSibling();
    }
    return nullptr;
}

std::string readTextNode(const DOMElement *el)
{
    std::string sstr;
    DOMNode *node = el->getFirstChild();
    if (node->getNodeType() == DOMNode::TEXT_NODE) {
        char *cstr = XMLString::transcode(node->getNodeValue());
        sstr = cstr;
        XMLString::release(&cstr);
    }
    return sstr;
}

int main(int argc, char **argv)
{
    XMLPlatformUtils::Initialize();

    XMLCh tempStr[100];
    XMLString::transcode("LS", tempStr, 99);
    DOMImplementation *impl = DOMImplementationRegistry::getDOMImplementation(tempStr);
    DOMLSParser *parser = ((DOMImplementationLS*)impl)->createLSParser(DOMImplementationLS::MODE_SYNCHRONOUS, 0);
    DOMDocument *doc = impl->createDocument(0, 0, 0);

    doc = parser->parseURI("config.xml");

    DOMElement *el = doc->getDocumentElement(); // <settings>
    el = nextChildElement(el);                  //   <port>
    el = nextChildElement(el);                  //     <reference>Ref1</reference>

    // No memory leak
    std::string nodestr;
    while (1) {
        nodestr = readTextNode(el); // nodestr is "Ref1"
    }
}