使用递归数据结构的 Boost 序列化最终导致堆栈溢出

Boost serialization with recursive data structure ends up in stack overflow

问题

我在使用具有递归数据结构的 Boost 序列化库时遇到问题。更准确地说,我想序列化一个矩阵,该矩阵由包含一个值的节点表示,并且每个节点都可以访问其邻居(顶部、底部、左侧、右侧)。为了访问节点,每个入口点都存储在 vector 中(即每行和每列的第一个和最后一个节点)。这是 Node class :

class Node
{
private:
    int v;
    Node* left; 
    Node* right;
    Node* top;
    Node* bottom;

public:
    Node() : v(rand() % 100), left(NULL), right(NULL), top(NULL), bottom(NULL)
    {

    }

    //Potential memory leak but that's not the topic
    void setLeft(Node* toSet) { left = toSet; }
    void setRight(Node* toSet) { right = toSet; }
    void setTop(Node* toSet) { top = toSet; }
    void setBottom(Node* toSet) { bottom = toSet; }

    Node* gLeft() { return left; }
    Node* gRight() { return right; }
    Node* gTop() { return top; }
    Node* gBottom() { return bottom; }

    int gValue() { return v; }

    template<class Archive>
    void serialize(Archive& ar, const unsigned int version)
    {
        ar& v;
        ar& left;
        ar& right;
        ar& top;
        ar& bottom;
    }
};

这里是 Matrix class,这个例子有一个 generateValues() 函数:

class Matrix
{
private:
    int m, n;
    std::vector<Node*> firstNodesPerRow;
    std::vector<Node*> lastNodesPerRow;
    std::vector<Node*> firstNodesPerCol;
    std::vector<Node*> lastNodesPerCol;
public:
    Matrix(int m, int n) : 
        m(m), n(n),
        firstNodesPerRow(m, NULL), lastNodesPerRow(m,NULL),
        firstNodesPerCol(n, NULL),lastNodesPerCol(n, NULL)
    {
        
    }

    void generateValues()
    {
        for (int i = 0; i < m; i++)
        {
            for (int j = 0; j < n; j++)
            {
                Node* toWrite = new Node();
                if (i > 0)
                {
                    toWrite->setTop(lastNodesPerCol.at(j));
                    lastNodesPerCol.at(j)->setBottom(toWrite);
                    lastNodesPerCol.at(j) = toWrite;
                }
                else
                {
                    firstNodesPerCol.at(j) = toWrite;
                    lastNodesPerCol.at(j) = toWrite;
                }
                if (j > 0)
                {
                    toWrite->setLeft(lastNodesPerRow.at(i));
                    lastNodesPerRow.at(i)->setRight(toWrite);
                    lastNodesPerRow.at(i) = toWrite;
                }
                else
                {
                    firstNodesPerRow.at(i) = toWrite;
                    lastNodesPerRow.at(i) = toWrite;
                }
            }
        }
    }

    template<class Archive>
    void serialize(Archive& ar, const unsigned int version)
    {
        ar& m;
        ar& n;
        ar& firstNodesPerRow;
        ar& firstNodesPerCol;
        ar& lastNodesPerRow;
        ar& lastNodesPerCol;
    }
};

所以我想实现的是序列化和反序列化一个Matrix。这是我的 main 函数:

#include <cstdlib>
#include <sstream>
#include <vector>
#include <boost/archive/text_oarchive.hpp>
#include <boost/archive/text_iarchive.hpp>
#include <boost/serialization/vector.hpp>

int main(int argc, char* argv[])
{
    int m = 10; int n = 10;
    Matrix toSerialize(m,n);
    toSerialize.generateValues();

    /*
        1) Serialize
    */
    std::ostringstream oss;
    boost::archive::text_oarchive oa(oss);
    oa << toSerialize;

    std::string serialiazedData = oss.str();

    /*
        2) Deserialize
    */
    Matrix result(m,n);
    std::stringstream serializedDataStream(serialiazedData);
    boost::archive::text_iarchive ia(serializedDataStream);
    ia >> result;

    return EXIT_SUCCESS;
}

问题如下:给定足够大的值 mnmainstack-overflow 异常结束。我知道这是来自 Nodeserialize 方法,因为为了序列化一个节点,它需要序列化邻居等等......我发现 似乎是完全相同的问题。答案作为起点很有趣,但我很难实施它,因为它不够精确。我的理解是,要解决我的问题,我需要:

  1. 以迭代的方式序列化节点,这样到邻居的时候,对象已经序列化了,没有stack-overflow;
  2. 序列化拓扑,在我的案例中,它通过 Node
  3. 的指针 top,bottom,right,left 表示

我在实际实施这个解决方案时遇到了麻烦,因为我能想象到的第 1 点的唯一方法是在 [=16] 的 serialize 方法中删除 top,bottom,right,left 的序列化=],但我无法实现第 2 点?

编辑:我做了一个矩阵图来帮助阅读。请注意,m x n 矩阵中不一定有 m x n 个节点。

编辑 2:我想到的解决方案(不起作用,在反序列化时以堆栈溢出结束)。

//Class Node
template<class Archive>
void serialize(Archive& ar, const unsigned int version)
{
    ar& v;
}

//Class Matrix
template<class Archive>
void serialize(Archive& ar, const unsigned int version)
{
    ar& m;
    ar& n;

    for (int i = 0; i < m; i++)
    {
        Node* current = firstNodesPerRow.at(i);
        while (1)
        {
            if (current == NULL) { break; }
            ar& current;
            current = current->gRight();
        }
    }

    ar& firstNodesPerRow;
    ar& firstNodesPerCol;
    ar& lastNodesPerRow;
    ar& lastNodesPerCol;
}

解决方案

解决方法的解释在post标记的答案中给出。这是此解决方案的实现:

// class Node
template<class Archive>
void serialize(Archive& ar, const unsigned int version)
{
    ar& v;
}

// some buffer struct
struct Neighbors
{
    Node* top;
    Node* bottom;
    Node* left;
    Node* right;

    template <typename Archive>
    void serialize(Archive& ar, const unsigned int version)
    {
        ar& top;
        ar& bottom;
        ar& left;
        ar& right;
    }
};

//class Matrix
template<class Archive>
void save(Archive& ar, const unsigned int version) const
{
    std::map<Node*, Neighbors> neighborsPerNode;

    for (int i = 0; i < m; i++)
    {
        Node* current = firstNodesPerRow.at(i);
        while (1)
        {
            if (current == NULL) { break; }
            neighborsPerNode[current] = {
                current->gTop(),
                current->gBottom(),
                current->gLeft(),
                current->gRight(),
            };
            current = current->gRight();
        }
    }

    ar& neighborsPerNode;

    ar& m;
    ar& n;

    ar& firstNodesPerRow;
    ar& firstNodesPerCol;
    ar& lastNodesPerRow;
    ar& lastNodesPerCol;
}
template<class Archive>
void load(Archive& ar, const unsigned int version)
{
    // Warning ALL the nodes are browsed 2 times :
    //  1 - to deserialize neighborsPerNode (below)
    //  2 - in the for loop, to create the links between the nodes
    
    std::map<Node*, Neighbors> neighborsPerNode;
    ar& neighborsPerNode;

    ar& m;
    ar& n;

    ar& firstNodesPerRow;
    ar& firstNodesPerCol;
    ar& lastNodesPerRow;
    ar& lastNodesPerCol;

    for (int i = 0; i < m; i++)
    {
        Node* current = firstNodesPerRow.at(i);
        while (1)
        {
            if (current == NULL) { break; }
            Neighbors myNeighbors = neighborsPerNode.at(current);
            current->setTop(myNeighbors.top);
            current->setBottom(myNeighbors.bottom);
            current->setLeft(myNeighbors.left);
            current->setRight(myNeighbors.right);
            current = current->gRight();    
        }
    }
}
BOOST_SERIALIZATION_SPLIT_MEMBER()

为什么不简单地按矩阵的顺序序列化所有元素并完全避免函数调用递归,例如:

template<class Archive>
void serialize(Archive& ar, const unsigned int version)
{
    ar& m;
    ar& n;
    for (int i = 0; i < m; ++i)
    {
        Node* node = firstNodesPerRow[i];
        for (int j = 0; j < n; ++j)
        {
            ar & node->gValue();
            node = node->gRight();
        }
    }
}

顺便说一句,这在 保存 矩阵中有效。我认为你需要在保存和加载版本中专门化序列化函数,因为对于加载版本你需要:

  1. 加载n,m
  2. 分配所有节点并填充矩阵中的节点指针向量
  3. 按照保存时的相同顺序加载所有值
    template<class Archive>
    void save(Archive & ar, const unsigned int version) const
    {
        ...
    }
    template<class Archive>
    void load(Archive & ar, const unsigned int version)
    {
        ...
    }
    BOOST_SERIALIZATION_SPLIT_MEMBER()

在节点可能丢失的复杂情况下,同样的想法也适用。而且您仍然需要在 save/load 之间拆分,添加分配以加载。 但是需要更多的簿记才能正确保存并再次加载。

例如,您可以先如上所述遍历所有节点,但创建一个从每个节点指针值到唯一递增 ID 号的映射。当您保存每个节点的值时(如上一行一行),还保存每个方向的节点 ID。加载时,首先创建一个空映射:ID -> 节点指针。然后逐行再次恢复节点,同时从地图中恢复邻居指针。当然,每当第一次遇到ID时,您都需要分配一个新节点。