并发与并行——特别是在 C++ 中
Concurrency vs Parallelism - Specifically in C++
我了解两者的基本区别,我经常在我的程序中使用 std::async,这给了我并发性。
是否有任何 reliable/notable 库可以在 C++ 中提供并行性? (我知道这可能是 C++17 的一个特性)。如果有,您对他们的体验如何?
谢谢!
芭芭拉
线程构建块(TBB) is a templated C++ library for task parallelism. The library contains various algorithms and data structures specialized for task parallelism. I have had success with using parallel_for as well as parallel_pipeline to greatly speed up computations. With a little bit of extra coding, TBB's parallel_for can take a serial for loop that is appropriate for being executed in parallel and make it execute as such (See example here). TBB's parallel_pipeline has the ability to execute a chain of dependent tasks with the option of each being executed in parallel or serial (See example here). There are many more examples on the web especially at software.intel.com and here on Whosebug (see here)。
OpenMP 是一个 API 线程并行性,主要通过编译器指令访问。尽管我更喜欢使用 TBB 提供的更丰富的功能集,但 OpenMP 可以成为测试并行算法和代码的快速方法(只需添加一个 pragma 并设置一些构建设置)。一旦经过测试和实验,我发现将 OpenMP 的某些用途转换为 TBB 可以相当容易地完成。这并不是说 OpenMP 不适用于严肃的编码。事实上,在某些情况下,人们更喜欢 OpenMP 而不是 TBB(一个是因为它主要依赖编译指示,切换到串行执行比使用 TBB 更容易。)。在此 discussion. There are a number of examples (e.g., on wikipedia) 中可以找到许多利用 OpenMP 的开源项目,网络上的 OpenMP 教程包括许多关于 Whosebug 的问题。
我之前忽略了关于 SIMD (single instruction, multiple data), which provides data parallelism. As pointed out in the below comments, OpenMP is an option for exploring SIMD (check this link). Extensions to instruction sets such as SSE and AVX (both extensions to the x86 instruction set architecture) as well as NEON (ARM architecture) are also worthwhile to explore. I have had good and bad experience with using SSE and AVX. The good is that they can provide a nice speed up to certain algorithms (in particular I have used Intel intrinsics) 的讨论。不好的是,使用这些指令的能力取决于特定的 CPU 支持,这可能会导致意外的运行时异常。
特别是在并行性和数学方面,我有很好的使用经验 Intel MKL (which now has a no cost option) as well as OpenBLAS. These libraries provide optimized, parallel, and/or vectorized implementations of common mathematical functions/routines (e.g., BLAS and LAPACK)。还有更多可用的库专门处理数学,在某种程度上涉及优化的并行性。虽然它们可能不提供较低级别的并行构建块(例如,操纵线程、安排任务的能力),但利用(并贡献)计算数学领域的大量研究和工作是非常值得的。对于数学以外的兴趣领域也可以说类似的陈述。
我了解两者的基本区别,我经常在我的程序中使用 std::async,这给了我并发性。
是否有任何 reliable/notable 库可以在 C++ 中提供并行性? (我知道这可能是 C++17 的一个特性)。如果有,您对他们的体验如何?
谢谢! 芭芭拉
线程构建块(TBB) is a templated C++ library for task parallelism. The library contains various algorithms and data structures specialized for task parallelism. I have had success with using parallel_for as well as parallel_pipeline to greatly speed up computations. With a little bit of extra coding, TBB's parallel_for can take a serial for loop that is appropriate for being executed in parallel and make it execute as such (See example here). TBB's parallel_pipeline has the ability to execute a chain of dependent tasks with the option of each being executed in parallel or serial (See example here). There are many more examples on the web especially at software.intel.com and here on Whosebug (see here)。
OpenMP 是一个 API 线程并行性,主要通过编译器指令访问。尽管我更喜欢使用 TBB 提供的更丰富的功能集,但 OpenMP 可以成为测试并行算法和代码的快速方法(只需添加一个 pragma 并设置一些构建设置)。一旦经过测试和实验,我发现将 OpenMP 的某些用途转换为 TBB 可以相当容易地完成。这并不是说 OpenMP 不适用于严肃的编码。事实上,在某些情况下,人们更喜欢 OpenMP 而不是 TBB(一个是因为它主要依赖编译指示,切换到串行执行比使用 TBB 更容易。)。在此 discussion. There are a number of examples (e.g., on wikipedia) 中可以找到许多利用 OpenMP 的开源项目,网络上的 OpenMP 教程包括许多关于 Whosebug 的问题。
我之前忽略了关于 SIMD (single instruction, multiple data), which provides data parallelism. As pointed out in the below comments, OpenMP is an option for exploring SIMD (check this link). Extensions to instruction sets such as SSE and AVX (both extensions to the x86 instruction set architecture) as well as NEON (ARM architecture) are also worthwhile to explore. I have had good and bad experience with using SSE and AVX. The good is that they can provide a nice speed up to certain algorithms (in particular I have used Intel intrinsics) 的讨论。不好的是,使用这些指令的能力取决于特定的 CPU 支持,这可能会导致意外的运行时异常。
特别是在并行性和数学方面,我有很好的使用经验 Intel MKL (which now has a no cost option) as well as OpenBLAS. These libraries provide optimized, parallel, and/or vectorized implementations of common mathematical functions/routines (e.g., BLAS and LAPACK)。还有更多可用的库专门处理数学,在某种程度上涉及优化的并行性。虽然它们可能不提供较低级别的并行构建块(例如,操纵线程、安排任务的能力),但利用(并贡献)计算数学领域的大量研究和工作是非常值得的。对于数学以外的兴趣领域也可以说类似的陈述。