多个数组的逐元素求和的内存高效 MPI 并行实现

Question

我是 MPI 的新手，我想计算两个（或更多）大型数组的逐元素总和。这个任务的简单实现是这样的（对于这个伪代码的任何错误，我深表歉意。如果标准实现有很大不同，请告诉我）：

#include<stdio.h>
#include<stdlib.h>
#include<mpi.h>
#define NELEMS 10
#define NP 2
#define TAG 15

int setElements(int,int);

int postProcess(int *);

int main(int argc,char **argv){
    int nprocs,myrank,*myarray,*result,i,initflag=0;
    MPI_Status status;
    MPI_Init(&argc,&argv);
    MPI_Comm_size(MPI_COMM_WORLD,&nprocs);
    MPI_Comm_rank(MPI_COMM_WORLD,&myrank);
    myarray=malloc(sizeof(int)*NELEMS);
    if(myrank==0){
        result=malloc(sizeof(int)*NELEMS);
        if(myrank==0){
            result[i]=0;
        }
        initflag=1;
        for(i=1;i<NP;i++){
            MPI_Send(&initflag,1,MPI_INT,i,TAG,MPI_COMM_WORLD);
        }
    }
    for(i=0;i<NELEMS;i++){
        myarray[i]=setElements(myrank,i);
    }
    if(myrank!=0){
        MPI_Recv(&initflag,1,MPI_INT,0,TAG,MPI_COMM_WORLD,&status);
    }
    MPI_Reduce(myarray,result,NELEMS,MPI_INT,MPI_SUM,0,MPI_COMM_WORLD);
    MPI_Finalize();
    free(myarray);
    /*At this point the element-wise sum should be stored on myrank==0*/
    if(myrank==0){
        postProcess(result);
    }
    free(result);
    return 0;
}

我担心的是此实现需要 myrank==0 分配两个大小为 NELEMS 的数组（而一个用于 myrank!=0），这对于大型 NELEMS 可能会有问题.一个简单的解决方法（如果可用）是再使用一个处理器并使 myrank==0 空闲，直到 setElements 循环在其他进程中完成，但这似乎不是非常有效地使用处理器，尤其是当setElements 的计算在计算上很昂贵。

所以我的问题是：有没有更聪明的方法来使用 MPI 计算大型数组的逐元素总和？还是我应该考虑一个截然不同的策略？

Answer 1

您可以在根进程上就地执行缩减。引用文档：

When the communicator is an intracommunicator, you can perform a reduce operation in-place (the output buffer is used as the input buffer). Use the variable MPI_IN_PLACE as the value of the root process sendbuf. In this case, the input data is taken at the root from the receive buffer, where it will be replaced by the output data.

在您的情况下，修改 MPI_Reduce 调用来自：

MPI_Reduce(myarray,result,NELEMS,MPI_INT,MPI_SUM,0,MPI_COMM_WORLD);

至：

if (my_rank == 0)
  MPI_Reduce(MPI_IN_PLACE,myarray,NELEMS,MPI_INT,MPI_SUM,0,MPI_COMM_WORLD);
else
  MPI_Reduce(myarray,nullptr,NELEMS,MPI_INT,MPI_SUM,0,MPI_COMM_WORLD);

使用这种方法，您根本不需要第二个数组 (result)。

多个数组的逐元素求和的内存高效 MPI 并行实现

Memory-efficient MPI parallel implementation of element-wise sum of multiple arrays

c

mpi