OpenMP 如何在线程之间与应该是私有变量的内容进行通信？

Question

我正在使用 OpenMP 用 C++ 编写一些代码来并行化一些块。我运行遇到了一些我无法解释的奇怪行为。我已经重写了我的代码，以便它可以最少地复制问题。

首先，这是我写的一个函数，它是运行在一个并行区域。

void foo()
{
    #pragma omp for
    for (int i = 0; i < 3; i++)
    {
        #pragma omp critical
        printf("Hello %d from thread %d.\n", i, omp_get_thread_num());
    }
}

那么这是我的整个程序。

int main()
{
    omp_set_num_threads(4);
    #pragma omp parallel
    {
        for (int i = 0; i < 2; i++)
        {
            foo();
            #pragma omp critical
            printf("%d\n", i);
        }
    }
    return 0;
}

当我编译并运行这段代码（使用 g++ -std=c++17）时，我在终端上得到以下输出：

Hello 0 from thread 0.
Hello 1 from thread 1.
Hello 2 from thread 2.
0
0
Hello 2 from thread 2.
Hello 1 from thread 1.
0
Hello 0 from thread 0.
0
1
1
1
1

i 是私有变量。我希望函数 foo 每个线程运行两次。所以我希望在终端中看到八个“Hello from %d thread %d.\n”语句，就像我在打印 i 时看到的八个数字一样。那么这里给出了什么？为什么在同一个循环中，OMP 的行为如此不同？

Answer 1

这是因为#pragma omp for是一个工作共享结构，所以它会在线程之间分配工作，在这方面使用的线程数无关紧要，只是循环次数（2*3=6).

如果您使用 omp_set_num_threads(1);，您还会看到 6 个 outputps。如果您使用的线程数多于循环数，一些线程将在内部循环中空闲，但您仍然会看到恰好 6 个输出。

另一方面，如果删除 #pragma omp for 行，您将看到 (number of threads)*2*3 (=24) 个输出。

Answer 2

来自documentation of omp parallel：

Each thread in the team executes all statements within a parallel region except for work-sharing constructs.

强调我的。由于 foo 中的 omp for 是一个 work-sharing 构造，因此无论有多少线程运行 main 中的并行块，每次外部迭代都只执行一次.

OpenMP 如何在线程之间与应该是私有变量的内容进行通信？

How is OpenMP communicating between threads with what should be a private variable?

c++

multithreading

openmp