std::valarray 和并行化

Question

可能是这个问题太愚蠢了。

我在 this 网站上看到

The valarray specification allows for libraries to implement it with several efficiency optimizations, such as parallelization of certain operations

目前 std::valarray 在不同平台和编译器上的并行化情况如何？ GCC，VS2010/2013，当当？

特别是 C++11 的标准线程支持。

UPD：如果一些 sompilers 不支持此功能。最好的方法是什么：在多个线程中将某些功能应用于容器的元素？显然，天真的解决方案会很短并且适用于 std::thread 但也许存在更好的解决方案？

Answer 1

Intel 似乎在这方面做了一些工作。

对于其他人：我不这么认为。 cppreference says 那

Some C++ standard library implementations use expression templates to implement efficient operations on std::valarray (e.g. GNU libstdc++ and LLVM libc++). Only rarely are valarrays optimized any further, as in e.g. Intel Parallel Studio.

我也没有找到任何文档说明 libc++ 或 libstdc++ 在这方面做了任何花哨的事情，而且通常没有人隐藏很酷的功能。 :)

考虑 MSVC：我曾经遇到过使用 std::valarray 编译但没有 link 的代码，因为微软“忘记”实现了一些方法。这当然不是证据，但对我来说，这听起来也不像是发生了什么很酷的事情。我也在那里找不到任何特殊功能的文档。

那么我们可以做些什么呢？

首先，我们可以使用 parallel mode 让 libstdc++ 在它认为有用的地方将以下算法与 OpenMP 并行化：

std::accumulate    
std::adjacent_difference    
std::inner_product    
std::partial_sum    
std::adjacent_find    
std::count    
std::count_if    
std::equal    
std::find    
std::find_if    
std::find_first_of    
std::for_each    
std::generate    
std::generate_n    
std::lexicographical_compare    
std::mismatch    
std::search    
std::search_n    
std::transform    
std::replace    
std::replace_if    
std::max_element    
std::merge    
std::min_element    
std::nth_element    
std::partial_sort    
std::partition    
std::random_shuffle    
std::set_union    
std::set_intersection    
std::set_symmetric_difference    
std::set_difference    
std::sort    
std::stable_sort    
std::unique_copy

为此，只需在编译期间定义_GLIBCXX_PARALLEL。我觉得这涵盖了很多人们想用数字数组做的事情。当然

Note that the _GLIBCXX_PARALLEL define may change the sizes and behavior of standard class templates such as std::search, and therefore one can only link code compiled with parallel mode and code compiled without parallel mode if no instantiation of a container is passed between the two translation units. Parallel mode functionality has distinct linkage, and cannot be confused with normal mode symbols.

（来自 here。）

另一个可以帮助您并行化的工具是 Intel Advisor。这是更高级的，我相信它也可以处理你的循环（我自己从未使用过），但当然这是专有软件。

对于线性代数运算，您还可以寻找一个好的并行 LAPACK 实现。

std::valarray 和并行化

std::valarray and parallelization

c++

parallel-processing

c++11

valarray

那么我们可以做些什么呢？