C++ 中的 OpenMP 和资源管理
OpenMP and resource management in c++
我有一个资源需要在两次访问之间保持其状态。当使用 OpenMP 并行化程序时,我想确保每个线程都有自己的副本,并且不会为每个并行区域销毁和重新创建实例。为此,我使用了一个全局变量,即threadprivate
。下面,我有一个简单的测试用例来说明设置。
我有两个问题:
- 是否保证在程序执行过程中每个线程只创建一次资源(下图obj)?
- 当我在四个线程上 运行 示例程序时,每个线程报告 "Obj created..." 和 "State set to..." 但只有线程零报告 "Obj destroyed..."。这是怎么回事?
#ifdef _OPENMP
#include <omp.h>
#endif
#include <vector>
#include <iostream>
#include <iomanip>
class obj {
public:
obj() : state(0) {
res = new int [100];
#pragma omp critical
{
std::cout << "Obj created, state " << state;
#ifdef _OPENMP
std::cout << ", thread " << omp_get_thread_num();
#endif
std::cout << std::endl;
}
}
~obj() {
delete[] res;
#pragma omp critical
{
std::cout << "Obj destroyed, state " << state;
#ifdef _OPENMP
std::cout << ", thread " << omp_get_thread_num();
#endif
std::cout << std::endl;
}
}
void init(int set) {
state = set;
#pragma omp critical
{
std::cout << "State set to " << state;
#ifdef _OPENMP
std::cout << ", thread " << omp_get_thread_num();
#endif
std::cout << std::endl;
}
}
int operator()() {
return ++state;
}
private:
int state;
int* res;
};
extern obj obj1;
#pragma omp threadprivate(obj1)
obj obj1;
void init() {
#ifdef _OPENMP
#pragma omp parallel
{
obj1.init(100 * omp_get_thread_num());
}
#else
obj1.init(100);
#endif
}
void work() {
std::cout << "Computing" << std::endl;
int constexpr length = 20;
std::vector<int> vec(length);
#pragma omp parallel for
for (int idx = 0; idx < length; idx++) {
vec[idx] = obj1();
}
std::cout.fill('0');
for (auto const & e: vec) {
std::cout << std::setw(3) << e << ' ';
}
std::cout << std::endl;
}
int main() {
init();
work();
work();
work();
}
threadprivate 在以下条件下可以正常工作
#pragma omp threadprivate
存在 after each 变量声明;
- 必须使用
omp_set_dynamic(false)
. 关闭动态线程(默认是实现定义的)
参见示例 here。
我根本不会依赖被调用的析构函数。 OpenMP 留下了很多未指定的内容,编译器可能会对其进行优化。
以下是 OpenMP 规范 (v4.0 p.12.14.2) 的摘录
The storage of all copies of a threadprivate variable is freed according to how static variables are handled in the base language, but at an unspecified point in the program.
和
The order in which any destructors for different threadprivate C++ variables of class type are called is unspecified.
有关 threadprivate
的更多信息
(v4.0 p2.4.12)
Each copy of a threadprivate
variable is initialized once, in the manner specified by the program, but at an unspecified point in the program prior to the first reference to that copy. The storage of all copies of a threadprivate
variable is freed according to how static variables are handled in the base language, but at an unspecified point in the program.
A program in which a thread references another thread’s copy of a threadprivate
variable is non-conforming.
The content of a threadprivate
variable can change across a task scheduling point if the executing thread switches to another task that modifies the variable. For more details on task scheduling, see Section 1.3 on page 14 and Section 2.11 on page 113.
In parallel
regions, references by the master thread will be to the copy of the variable in the thread that encountered the parallel
region.
During a sequential part references will be to the initial thread’s copy of the variable. The values of data in the initial thread’s copy of a threadprivate
variable are guaranteed to persist between any two consecutive references to the variable in the program.
The values of data in the threadprivate
variables of non-initial threads are guaranteed to persist between two consecutive active parallel
regions only if all the following conditions hold:
Neither parallel
region is nested inside another explicit parallel
region.
The number of threads used to execute both parallel
regions is the same.
The thread affinity policies used to execute both parallel
regions are the same.
The value of the dyn-var internal control variable in the enclosing task region is false at entry to both parallel
regions.
If these conditions all hold, and if a threadprivate
variable is referenced in both regions, then threads with the same thread number in their respective regions will reference the same copy of that variable.
我有一个资源需要在两次访问之间保持其状态。当使用 OpenMP 并行化程序时,我想确保每个线程都有自己的副本,并且不会为每个并行区域销毁和重新创建实例。为此,我使用了一个全局变量,即threadprivate
。下面,我有一个简单的测试用例来说明设置。
我有两个问题:
- 是否保证在程序执行过程中每个线程只创建一次资源(下图obj)?
- 当我在四个线程上 运行 示例程序时,每个线程报告 "Obj created..." 和 "State set to..." 但只有线程零报告 "Obj destroyed..."。这是怎么回事?
#ifdef _OPENMP
#include <omp.h>
#endif
#include <vector>
#include <iostream>
#include <iomanip>
class obj {
public:
obj() : state(0) {
res = new int [100];
#pragma omp critical
{
std::cout << "Obj created, state " << state;
#ifdef _OPENMP
std::cout << ", thread " << omp_get_thread_num();
#endif
std::cout << std::endl;
}
}
~obj() {
delete[] res;
#pragma omp critical
{
std::cout << "Obj destroyed, state " << state;
#ifdef _OPENMP
std::cout << ", thread " << omp_get_thread_num();
#endif
std::cout << std::endl;
}
}
void init(int set) {
state = set;
#pragma omp critical
{
std::cout << "State set to " << state;
#ifdef _OPENMP
std::cout << ", thread " << omp_get_thread_num();
#endif
std::cout << std::endl;
}
}
int operator()() {
return ++state;
}
private:
int state;
int* res;
};
extern obj obj1;
#pragma omp threadprivate(obj1)
obj obj1;
void init() {
#ifdef _OPENMP
#pragma omp parallel
{
obj1.init(100 * omp_get_thread_num());
}
#else
obj1.init(100);
#endif
}
void work() {
std::cout << "Computing" << std::endl;
int constexpr length = 20;
std::vector<int> vec(length);
#pragma omp parallel for
for (int idx = 0; idx < length; idx++) {
vec[idx] = obj1();
}
std::cout.fill('0');
for (auto const & e: vec) {
std::cout << std::setw(3) << e << ' ';
}
std::cout << std::endl;
}
int main() {
init();
work();
work();
work();
}
threadprivate 在以下条件下可以正常工作
#pragma omp threadprivate
存在 after each 变量声明;- 必须使用
omp_set_dynamic(false)
. 关闭动态线程(默认是实现定义的)
参见示例 here。
我根本不会依赖被调用的析构函数。 OpenMP 留下了很多未指定的内容,编译器可能会对其进行优化。
以下是 OpenMP 规范 (v4.0 p.12.14.2) 的摘录
The storage of all copies of a threadprivate variable is freed according to how static variables are handled in the base language, but at an unspecified point in the program.
和
The order in which any destructors for different threadprivate C++ variables of class type are called is unspecified.
有关 threadprivate
的更多信息
(v4.0 p2.4.12)
Each copy of a
threadprivate
variable is initialized once, in the manner specified by the program, but at an unspecified point in the program prior to the first reference to that copy. The storage of all copies of athreadprivate
variable is freed according to how static variables are handled in the base language, but at an unspecified point in the program.A program in which a thread references another thread’s copy of a
threadprivate
variable is non-conforming.The content of a
threadprivate
variable can change across a task scheduling point if the executing thread switches to another task that modifies the variable. For more details on task scheduling, see Section 1.3 on page 14 and Section 2.11 on page 113.In
parallel
regions, references by the master thread will be to the copy of the variable in the thread that encountered theparallel
region.During a sequential part references will be to the initial thread’s copy of the variable. The values of data in the initial thread’s copy of a
threadprivate
variable are guaranteed to persist between any two consecutive references to the variable in the program.The values of data in the
threadprivate
variables of non-initial threads are guaranteed to persist between two consecutive activeparallel
regions only if all the following conditions hold:
Neither
parallel
region is nested inside another explicitparallel
region.The number of threads used to execute both
parallel
regions is the same.The thread affinity policies used to execute both
parallel
regions are the same.The value of the dyn-var internal control variable in the enclosing task region is false at entry to both
parallel
regions.If these conditions all hold, and if a
threadprivate
variable is referenced in both regions, then threads with the same thread number in their respective regions will reference the same copy of that variable.