`std::generate` 的并行版本比顺序版本表现更差

Question

我正在尝试使用 C++ 17 中的 Execution Policy 并行化一些旧代码。我的示例代码如下：

#include <cstdlib>
#include <chrono>
#include <iostream>
#include <algorithm>
#include <execution>
#include <vector>

using Clock = std::chrono::high_resolution_clock;
using Duration = std::chrono::duration<double>;

constexpr auto NUM = 100'000'000U;

double func()
{
  return rand();
}

int main()
{
  std::vector<double> v(NUM);
  // ------ feature testing
  std::cout << "__cpp_lib_execution         : " << __cpp_lib_execution << std::endl;
  std::cout << "__cpp_lib_parallel_algorithm: " << __cpp_lib_parallel_algorithm << std::endl;
  // ------ fill the vector with random numbers sequentially
  auto const startTime1 = Clock::now();
  std::generate(std::execution::seq, v.begin(), v.end(), func);
  Duration const elapsed1 = Clock::now() - startTime1;
  std::cout << "std::execution::seq: " << elapsed1.count() << " sec." << std::endl;
  // ------ fill the vector with random numbers in parallel
  auto const startTime2 = Clock::now();
  std::generate(std::execution::par, v.begin(), v.end(), func);
  Duration const elapsed2 = Clock::now() - startTime2;
  std::cout << "std::execution::par: " << elapsed2.count() << " sec." << std::endl;
}

我的 Linux 桌面上的程序输出：

__cpp_lib_execution         : 201902
__cpp_lib_parallel_algorithm: 201603
std::execution::seq: 0.971162 sec.
std::execution::par: 25.0349 sec.

为什么并行版本的性能比顺序版本差 25 倍？

编译器：g++ (Ubuntu 10.3.0-1ubuntu1~20.04) 10.3.0

Answer 1

The thread-safety of rand is implementation-defined。这意味着：

您的代码在并行情况下错误，或者
它实际上是串行的，具有高度竞争的锁，这会显着增加并行情况下的开销并获得极差的性能。

根据您的结果，我猜 #2 适用，但可能 两者都。

无论哪种方式，答案是：rand 是并行性的糟糕测试用例。

`std::generate` 的并行版本比顺序版本表现更差

Parallel version of the `std::generate` performs worse than the sequential one

c++

parallel-processing

stl

tbb

c++17