OpenMP:如何在 PARALLEL 块中正确嵌套 MASTER 和 FOR?

OpenMP: How to correctly nest both MASTER and FOR in a PARALLEL block?

我正在使用 OpenMP 和 OpenMPI 开发一个程序。

对于初始节点上的进程 运行,我希望有一个线程作为调度程序(与其他节点交互)和其他线程进行计算。

代码结构如下:

int computation(...)
{
    #pragma parallel for .....
}

int main(...)
{
    ...
    if (mpi_rank == 0) // initial node
    {
        #pragma omp parallel
        {
            #pragma omp master
            {
                // task scheduling for other nodes
            }
            {
                // WRONG: said 4 threads in total, this block will be executed for
                // 3 times simultaneously, and the nested "for" in the function
                // will spawn 4 threads each as well
                // so ACTUALLY 3*4+1=13 threads here!
                int computation(...);
            }
        }
    }
    else // other nodes
    {
        // get a task from node 0 scheduler by MPI
        int computation(...);
    }
}

我想要的是,在初始节点中,调度器占用一个线程,同时只执行一个计算函数,所以最多同时使用4个线程。

我也试过:

int computation(...)
{
    register int thread_use = omp_get_max_threads();    // this is 4
    if (rank == 0)
    {
        --thread_use;   // if initial node, use 3
    }
    #pragma parallel for ..... num_threads(thread_use)
}

int main(...)
{
    ...
    if (mpi_rank == 0) // initial node
    {
        #pragma omp parallel
        {
            #pragma omp master
            {
                // task scheduling for other nodes
            }
            #pragma omp single
            {
                // WRONG: nest "for" can only use 1 thread
                int computation(...);
            }
        }
    }
    else // other nodes
    {
        // get a task from node 0 scheduler by MPI
        int computation(...);
    }
}

...或

//other parts are the same as above
if (mpi_rank == 0) // initial node
{
    #pragma omp parallel num_threads(2)
    {
        #pragma omp master
        {
            // task scheduling for other nodes
        }
        {
            // WRONG: nest "for" can only use 1 thread
            int computation(...);
        }
    }
}

...但其中 none 有效。

我应该如何使用 OpenMP 安排块来实现我的目标?任何帮助将不胜感激,非常感谢。

首先,如果要在OpenMP中指定嵌套并行,需要将环境变量OMP_NESTED设置为true

然后,可能的实现如下所示:

// Parallel region. Topmost level
#pragma omp parallel sections num_threads(2)
{
    #pragma omp section
    scheduling_function();

    #pragma omp section
    compute_function();
}

其中scheduling_function()是单线程函数,compute_function()结构类似于:

void compute_function() {
    // Nested parallel region. Bottommost level
    #pragma omp parallel
    {
        computation();
    }
}

有关 OpenMP nested parallelism

的更多信息