C++ 树遍历：为什么在这种情况下循环比递归慢？

Question

我需要在我的 C++ 代码中多次遍历一棵树，树的深度可能会因一次迭代而异。我也可能有条件地提前中断树遍历。在分析我的代码时（使用 Visual studio 编译器），我注意到树遍历部分是我代码中最大的瓶颈，因此我需要尽可能地加快该部分的速度。

下面是我的代码的描述和简化的可运行版本，以展示我目前遇到的问题。

在使用递归时，我注意到我可以通过有条件地提前中断递归来加速我的代码。然而，我实现的提前中断根本没有提高速度（见代码）。我认为通过使用循环而不是递归，早破会更容易实现，所以我将我的树遍历转换为循环。令人惊讶的是，循环版本比递归版本慢了一个数量级！此外，early-break 最多只提高了 10% 的速度，这令人惊讶，因为这是深度优先搜索遍历，当 break 发生时，树的很大一部分没有被遍历。因此，预计至少可以提速 50-100%。

我的问题：

为什么在循环版本下面的特定情况下是一个命令幅度变慢了？！
为什么提前中断并没有提高多少速度（对于循环和递归）
非常感谢以下案例的任何其他性能提示。

#include <iostream>
#include <vector>
#include <stack>
#include <chrono>

using namespace std;
using namespace std::chrono;

class Node {
public:
    int id;
    int left = -1;
    int right = -1;
    int count = 0;

    Node(int _id) { id = _id; }
};
std::vector<Node> nodes;

//1) recursive tree traversal
void recursive(int node) {
    if (nodes[node].left == -1) {
        nodes[node].count++;
    }
    else {
        recursive(nodes[node].right);
        recursive(nodes[node].left);
    }
}

//2) recursive tree traversal with conditional break
void recursive2(int node, bool* stop) {
    if (*stop == false) {
        if (nodes[node].left == -1) {
            nodes[node].count++;
            if (rand() % 2 == 0) { *stop = true; } //conditional break
        }
        else {
            recursive2(nodes[node].right, stop);
            if (*stop == false) {
                recursive2(nodes[node].left, stop);
            }
        }
    }
}

// loop traversal
void loop(int node) {
    stack<int> stack;
    stack.push(node);
    while (stack.size() > 0) {
        node = stack.top();
        stack.pop();
        if (nodes[node].left == -1) {
            nodes[node].count++;
            //if (rand() % 2 == 0) { break; } // conditional break
        }
        else {
            stack.push(nodes[node].right);
            stack.push(nodes[node].left);
        }
    }
}


int main()
{
    for (int i = 0; i < 7; i++) {
        nodes.push_back(Node(i));
    }
    // make a simple tree /node 6 is the root
    nodes[4].left = nodes[0].id;
    nodes[4].right = nodes[1].id;
    nodes[5].left = nodes[2].id;
    nodes[5].right = nodes[3].id;
    nodes[6].left = nodes[4].id;
    nodes[6].right = nodes[5].id;


    /// speed comparison 
    int n = 10000000;
    int root_node = 6;

    auto start = high_resolution_clock::now();
    for (int i = 0; i < n; i++) { recursive(root_node); }
    auto stop = high_resolution_clock::now();
    auto duration = duration_cast<milliseconds>(stop - start);
    cout << "recursion:" << duration.count() << endl;

    start = high_resolution_clock::now();
    for (int i = 0; i < n; i++) {
        bool stop = false;
        recursive2(root_node, &stop);
    }
    stop = high_resolution_clock::now();
    duration = duration_cast<milliseconds>(stop - start);
    cout << "recursion with early-break:" << duration.count() << endl;

    start = high_resolution_clock::now();
    for (int i = 0; i < n; i++) { loop(root_node); }
    stop = high_resolution_clock::now();
    duration = duration_cast<milliseconds>(stop - start);
    cout << "loop:" << duration.count() << endl;
}

Answer 1

与您要遍历的迭代次数相比，您正在遍历的树是如此之小运行，以至于管理堆栈对象的动态内存会使您认为的任何收益相形见绌（我们稍后会重新讨论）你正在循环执行它。

让我们尝试消除内存分配开销，看看速度会发生什么变化。

作为参考，这是我从你的代码中得到的，发布在我的本地 Visual Studio x64-Release build 上。

recursion:60
recursion with early-break:70
loop:2088

首先，std::stack 使用 std::deque，这对小筹码来说不是很好。切换到向量支持的堆栈应该会让事情变得更好。由于它是一行更改，因此没有理由至少尝试一下：

std::stack<int, std::vector<int>> stack;

recursion:58
recursion with early-break:68
loop:1853

确实如此！这并不惊天动地，但这是我们走在正确轨道上的好兆头。如果我们担心进行一次大遍历所需的时间，那可能就足够了。但是我们关心的是做 10000000 微小的遍历，所以我们需要走得更远：完全摆脱内存分配：

// I "could" allocate the data as a local variable, 
// but then it would be on the stack, completely defeating the purpose.
// This would normally be a recyclable heap-based chunk of memory passed to
// the function.
std::array<int, 50> g_stack; // just for the example, don't actually do this.
void loop(int node) {
  auto top = g_stack.begin();
  auto push = [&](int v) {*(top++) = v;};
  auto pop = [&]() {--top;};

  push(node);
  while (top != g_stack.begin()) {
    node = *(top-1);
    pop();
    if (nodes[node].left == -1) {
      nodes[node].count++;
    }
    else {
      push(nodes[node].right);
      push(nodes[node].left);
    }
  }
}

recursion:61
recursion with early-break:68
loop:65

现在我们正在谈论！但它仍然没有击败递归。这是怎么回事？

循环总是比使用递归更快的前提并不普遍适用。

基于循环的方法相对于递归的主要优势不是速度，而是它解决了递归的最大问题：运行出栈的可能性 space。通过使用循环，您可以比递归函数调用更深入地“递归”，因为操作堆栈位于堆上，通常有更多可用空间。

有时编译器处理循环代码比处理递归做得更好，导致运行时间更快，但这从来都不是给定的。

C++ 树遍历：为什么在这种情况下循环比递归慢？

C++ Tree traversal: Why Looping is slower than Recursion in this case?

c++

tree

recursion

performance

depth-first-search