为什么堆上的分配比堆栈上的分配快？

Question

就我对资源管理的了解而言，在堆上分配某些东西（运算符new）应该总是比在堆栈上分配（自动存储）慢，因为堆栈是基于后进先出的结构，因此它需要最少的簿记，下一个分配地址的指针是微不足道的。

到目前为止，还不错。现在看下面的代码：

/* ...includes... */

using std::cout;
using std::cin;
using std::endl;

int bar() { return 42; }

int main()
{
    auto s1 = std::chrono::steady_clock::now();
    std::packaged_task<int()> pt1(bar);
    auto e1 = std::chrono::steady_clock::now();

    auto s2 = std::chrono::steady_clock::now();
    auto sh_ptr1 = std::make_shared<std::packaged_task<int()> >(bar);
    auto e2 = std::chrono::steady_clock::now();

    auto first = std::chrono::duration_cast<std::chrono::nanoseconds>(e1-s1);
    auto second = std::chrono::duration_cast<std::chrono::nanoseconds>(e2-s2);

    cout << "Regular: " << first.count() << endl
         << "Make shared: " << second.count() << endl;

    pt1();
    (*sh_ptr1)();

    cout << "As you can see, both are working correctly: " 
         << pt1.get_future().get() << " & " 
         << sh_ptr1->get_future().get() << endl;

    return 0;
}

结果似乎与上面解释的东西矛盾：

Regular: 6131

Make shared: 843

As you can see, both are working correctly: 42 & 42

Program ended with exit code: 0

在第二次测量中，除了运算符 new 的调用外，std::shared_ptr (auto sh_ptr1) 的构造函数必须完成。我似乎无法理解为什么这比常规分配更快。

这是什么解释？

Answer 1

问题是对 std::packaged_task 的构造函数的第一次调用负责初始化每个线程状态的负载，然后不公平地归因于 pt1。这是基准测试（特别是微基准测试）的一个常见问题，可以通过预热来缓解；尝试阅读 How do I write a correct micro-benchmark in Java?

如果我复制你的代码，但运行首先复制两个部分，在系统时钟分辨率的限制内，结果是相同的。这说明了微基准测试的另一个问题，您应该运行多次进行小测试以准确测量总时间。

通过预热和运行每个部分 1000 次，我得到以下结果 (example)：

Regular: 132.986
Make shared: 211.889

差异（约 80ns）完全符合 malloc takes 100ns per call.

的经验法则

Answer 2

这是你的微基准测试的问题：如果你交换测量时间的顺序，你会得到相反的结果 (demo)。

看来 std::packaged_task 构造函数的第一次调用引起了很大的轰动。添加一个不定时的

std::packaged_task<int()> ignore(bar);

在测量时间之前解决了这个问题 (demo):

Regular: 505
Make shared: 937

Answer 3

我 tried your example at ideone 得到了与你相似的结果：

Regular: 67950 
Make shared: 696

然后我颠倒了测试的顺序：

auto s2 = std::chrono::steady_clock::now();
auto sh_ptr1 = std::make_shared<std::packaged_task<int()> >(bar);
auto e2 = std::chrono::steady_clock::now();

auto s1 = std::chrono::steady_clock::now();
std::packaged_task<int()> pt1(bar);
auto e1 = std::chrono::steady_clock::now();

发现了相反的结果：

Regular: 548
Make shared: 68065

所以这不是栈和堆的区别，而是第一次和第二次调用的区别。也许您需要查看 std::packaged_task.

的内部结构

为什么堆上的分配比堆栈上的分配快？

Why is allocation on the heap faster than allocation on the stack?

c++

heap-memory

new-operator

stack-memory

automatic-storage