C++ 中的 OpenMP 和资源管理

OpenMP and resource management in c++

我有一个资源需要在两次访问之间保持其状态。当使用 OpenMP 并行化程序时,我想确保每个线程都有自己的副本,并且不会为每个并行区域销毁和重新创建实例。为此,我使用了一个全局变量,即threadprivate。下面,我有一个简单的测试用例来说明设置。

我有两个问题:

  1. 是否保证在程序执行过程中每个线程只创建一次资源(下图obj)?
  2. 当我在四个线程上 运行 示例程序时,每个线程报告 "Obj created..." 和 "State set to..." 但只有线程零报告 "Obj destroyed..."。这是怎么回事?

#ifdef _OPENMP
#include <omp.h>
#endif
#include <vector>
#include <iostream>
#include <iomanip>

class obj {
public:
  obj() : state(0) {
    res = new int [100];
#pragma omp critical
    {
      std::cout << "Obj created, state " << state;
#ifdef _OPENMP
      std::cout << ", thread " << omp_get_thread_num();
#endif
      std::cout << std::endl;    
    }
  }

  ~obj() {
    delete[] res;
#pragma omp critical
    {
    std::cout << "Obj destroyed, state " << state;
#ifdef _OPENMP
      std::cout << ", thread " << omp_get_thread_num();
#endif
      std::cout << std::endl;    

    }
  }

  void init(int set) {
    state = set;
#pragma omp critical
    {
      std::cout << "State set to " << state;
#ifdef _OPENMP
      std::cout << ", thread " << omp_get_thread_num();
#endif
      std::cout << std::endl;    
    }
  }

  int operator()() {
    return ++state;
  }

private:
  int state;
  int* res;
};

extern obj obj1;
#pragma omp threadprivate(obj1)
obj obj1;

void init() {
#ifdef _OPENMP
#pragma omp parallel
  {
  obj1.init(100 * omp_get_thread_num());
  }
#else
  obj1.init(100);
#endif  
}

void work() {
  std::cout << "Computing" << std::endl;

  int constexpr length = 20;
  std::vector<int> vec(length);

#pragma omp parallel for
  for (int idx = 0; idx < length; idx++) {
    vec[idx] = obj1();
  }

  std::cout.fill('0');

  for (auto const & e: vec) {
    std::cout << std::setw(3) << e << ' ';
  }
  std::cout << std::endl;
}

int main() {
  init();
  work();
  work();
  work();
}

threadprivate 在以下条件下可以正常工作

  • #pragma omp threadprivate 存在 after each 变量声明;
  • 必须使用 omp_set_dynamic(false).
  • 关闭动态线程(默认是实现定义的)

参见示例 here

我根本不会依赖被调用的析构函数。 OpenMP 留下了很多未指定的内容,编译器可能会对其进行优化。

以下是 OpenMP 规范 (v4.0 p.12.14.2) 的摘录

The storage of all copies of a threadprivate variable is freed according to how static variables are handled in the base language, but at an unspecified point in the program.

The order in which any destructors for different threadprivate C++ variables of class type are called is unspecified.

有关 threadprivate 的更多信息 (v4.0 p2.4.12)

Each copy of a threadprivate variable is initialized once, in the manner specified by the program, but at an unspecified point in the program prior to the first reference to that copy. The storage of all copies of a threadprivate variable is freed according to how static variables are handled in the base language, but at an unspecified point in the program.

A program in which a thread references another thread’s copy of a threadprivate variable is non-conforming.

The content of a threadprivate variable can change across a task scheduling point if the executing thread switches to another task that modifies the variable. For more details on task scheduling, see Section 1.3 on page 14 and Section 2.11 on page 113.

In parallel regions, references by the master thread will be to the copy of the variable in the thread that encountered the parallel region.

During a sequential part references will be to the initial thread’s copy of the variable. The values of data in the initial thread’s copy of a threadprivate variable are guaranteed to persist between any two consecutive references to the variable in the program.

The values of data in the threadprivate variables of non-initial threads are guaranteed to persist between two consecutive active parallel regions only if all the following conditions hold:

  • Neither parallel region is nested inside another explicit parallel region.

  • The number of threads used to execute both parallel regions is the same.

  • The thread affinity policies used to execute both parallel regions are the same.

  • The value of the dyn-var internal control variable in the enclosing task region is false at entry to both parallel regions.

If these conditions all hold, and if a threadprivate variable is referenced in both regions, then threads with the same thread number in their respective regions will reference the same copy of that variable.