`std::generate` 的并行版本比顺序版本表现更差
Parallel version of the `std::generate` performs worse than the sequential one
我正在尝试使用 C++ 17 中的 Execution Policy 并行化一些旧代码。我的示例代码如下:
#include <cstdlib>
#include <chrono>
#include <iostream>
#include <algorithm>
#include <execution>
#include <vector>
using Clock = std::chrono::high_resolution_clock;
using Duration = std::chrono::duration<double>;
constexpr auto NUM = 100'000'000U;
double func()
{
return rand();
}
int main()
{
std::vector<double> v(NUM);
// ------ feature testing
std::cout << "__cpp_lib_execution : " << __cpp_lib_execution << std::endl;
std::cout << "__cpp_lib_parallel_algorithm: " << __cpp_lib_parallel_algorithm << std::endl;
// ------ fill the vector with random numbers sequentially
auto const startTime1 = Clock::now();
std::generate(std::execution::seq, v.begin(), v.end(), func);
Duration const elapsed1 = Clock::now() - startTime1;
std::cout << "std::execution::seq: " << elapsed1.count() << " sec." << std::endl;
// ------ fill the vector with random numbers in parallel
auto const startTime2 = Clock::now();
std::generate(std::execution::par, v.begin(), v.end(), func);
Duration const elapsed2 = Clock::now() - startTime2;
std::cout << "std::execution::par: " << elapsed2.count() << " sec." << std::endl;
}
我的 Linux 桌面上的程序输出:
__cpp_lib_execution : 201902
__cpp_lib_parallel_algorithm: 201603
std::execution::seq: 0.971162 sec.
std::execution::par: 25.0349 sec.
为什么并行版本的性能比顺序版本差 25 倍?
编译器:g++ (Ubuntu 10.3.0-1ubuntu1~20.04) 10.3.0
The thread-safety of rand
is implementation-defined。这意味着:
- 您的代码在并行情况下错误,或者
- 它实际上是串行的,具有高度竞争的锁,这会显着增加并行情况下的开销并获得极差的性能。
根据您的结果,我猜 #2 适用,但可能 两者都。
无论哪种方式,答案是:rand
是并行性的糟糕测试用例。
我正在尝试使用 C++ 17 中的 Execution Policy 并行化一些旧代码。我的示例代码如下:
#include <cstdlib>
#include <chrono>
#include <iostream>
#include <algorithm>
#include <execution>
#include <vector>
using Clock = std::chrono::high_resolution_clock;
using Duration = std::chrono::duration<double>;
constexpr auto NUM = 100'000'000U;
double func()
{
return rand();
}
int main()
{
std::vector<double> v(NUM);
// ------ feature testing
std::cout << "__cpp_lib_execution : " << __cpp_lib_execution << std::endl;
std::cout << "__cpp_lib_parallel_algorithm: " << __cpp_lib_parallel_algorithm << std::endl;
// ------ fill the vector with random numbers sequentially
auto const startTime1 = Clock::now();
std::generate(std::execution::seq, v.begin(), v.end(), func);
Duration const elapsed1 = Clock::now() - startTime1;
std::cout << "std::execution::seq: " << elapsed1.count() << " sec." << std::endl;
// ------ fill the vector with random numbers in parallel
auto const startTime2 = Clock::now();
std::generate(std::execution::par, v.begin(), v.end(), func);
Duration const elapsed2 = Clock::now() - startTime2;
std::cout << "std::execution::par: " << elapsed2.count() << " sec." << std::endl;
}
我的 Linux 桌面上的程序输出:
__cpp_lib_execution : 201902
__cpp_lib_parallel_algorithm: 201603
std::execution::seq: 0.971162 sec.
std::execution::par: 25.0349 sec.
为什么并行版本的性能比顺序版本差 25 倍?
编译器:g++ (Ubuntu 10.3.0-1ubuntu1~20.04) 10.3.0
The thread-safety of rand
is implementation-defined。这意味着:
- 您的代码在并行情况下错误,或者
- 它实际上是串行的,具有高度竞争的锁,这会显着增加并行情况下的开销并获得极差的性能。
根据您的结果,我猜 #2 适用,但可能 两者都。
无论哪种方式,答案是:rand
是并行性的糟糕测试用例。