为什么 std::async 比简单的分离线程慢？

Question

有人多次告诉我，我应该使用 std::async 来处理带有 std::launch::async 参数的任务类型（因此它在新的执行线程上确实很神奇） ).

受到这些陈述的鼓舞，我想看看 std::async 与：

相比如何

顺序执行
一个简单的分离std::thread
我的简单异步"implementation"

我的原始异步实现如下所示：

template <typename F, typename... Args>
auto myAsync(F&& f, Args&&... args) -> std::future<decltype(f(args...))>
{
    std::packaged_task<decltype(f(args...))()> task(std::bind(std::forward<F>(f), std::forward<Args>(args)...));
    auto future = task.get_future();

    std::thread thread(std::move(task));
    thread.detach();

    return future;
}

这里没什么特别的，将函子 f 连同它的参数打包到一个 std::packaged task 中，在一个新的 std::thread 上启动它，它是分离的， returns 与来自任务的std::future。

现在代码用 std::chrono::high_resolution_clock:

测量执行时间

int main(void)
{
    constexpr unsigned short TIMES = 1000;

    auto start = std::chrono::high_resolution_clock::now();
    for (int i = 0; i < TIMES; ++i)
    {
        someTask();
    }
    auto dur = std::chrono::high_resolution_clock::now() - start;

    auto tstart = std::chrono::high_resolution_clock::now();
    for (int i = 0; i < TIMES; ++i)
    {
        std::thread t(someTask);
        t.detach();
    }
    auto tdur = std::chrono::high_resolution_clock::now() - tstart;

    std::future<void> f;
    auto astart = std::chrono::high_resolution_clock::now();
    for (int i = 0; i < TIMES; ++i)
    {
        f = std::async(std::launch::async, someTask);
    }
    auto adur = std::chrono::high_resolution_clock::now() - astart;

    auto mastart = std::chrono::high_resolution_clock::now();
    for (int i = 0; i < TIMES; ++i)
    {
        f = myAsync(someTask);
    }
    auto madur = std::chrono::high_resolution_clock::now() - mastart;

    std::cout << "Simple: " << std::chrono::duration_cast<std::chrono::microseconds>(dur).count() <<
    std::endl << "Threaded: " << std::chrono::duration_cast<std::chrono::microseconds>(tdur).count() <<
    std::endl << "std::sync: " << std::chrono::duration_cast<std::chrono::microseconds>(adur).count() <<
    std::endl << "My async: " << std::chrono::duration_cast<std::chrono::microseconds>(madur).count() << std::endl;

    return EXIT_SUCCESS;
}

其中 someTask() 是一个简单的方法，我稍等一下，模拟完成的一些工作：

void someTask()
{
    std::this_thread::sleep_for(std::chrono::milliseconds(1));
}

最后，我的结果：

顺序：1263615
螺纹：47111
std::sync: 821441
我的异步：30784

谁能解释这些结果？看起来 std::aysnc 比我天真的实现慢得多 ，或者只是简单明了的 detached std::threads。 为什么是？在这些结果之后还有什么理由使用 std::async 吗？

（注意，我也用 clang++ 和 g++ 做了这个基准测试，结果非常相似）

更新：

阅读 Dave S 的回答后，我将我的小基准更新如下：

std::future<void> f[TIMES]; auto astart = std::chrono::high_resolution_clock::now(); for (int i = 0; i < TIMES; ++i) { f[i] = std::async(std::launch::async, someTask); } auto adur = std::chrono::high_resolution_clock::now() - astart;

因此 std::future 现在没有被销毁 - 因此加入 - 每个运行。在代码中进行此更改后，std::async 产生与我的实现相似的结果并分离 std::threads.

Answer 1

一个关键区别是，当未来被销毁时，异步返回的未来会加入线程，或者在您的情况下，被新值替换。

这意味着它必须执行someTask()并加入线程，这两者都需要时间。 None 您的其他测试正在这样做，它们只是独立地生成它们。

Answer 2

sts::asyncreturns一个特别的std::future。这个未来有一个 ~future 做 .wait().

所以你的例子根本不同。慢的实际上在你的时间里完成任务。速度快的只是将任务排队，而忘记了如何知道任务已经完成。由于让线程持续到 main 末尾的程序的行为是不可预测的，因此应该避免它。

比较任务的正确方法是在生成时存储结果 future，并且在计时器结束之前将它们全部 .wait()/.join()，或者避免破坏对象直到定时器超时。然而，最后一种情况使 sewuential 版本看起来比实际情况更糟。

您确实需要 join/wait 在开始下一次测试之前，否则您会从他们的时间中窃取资源。

请注意，移动的期货从源中删除了等待。

为什么 std::async 比简单的分离线程慢？

Why is std::async slow compared to simple detached threads?

c++

multithreading

asynchronous

c++11

stdasync