MPI中如何发送每个处理器的最后一个元素数组

How to send the last element array of each processor in MPI

我很难编写代码来执行类似于前缀扫描中的上升阶段部分的以下示例,并且不想使用函数 MPI_Scan:

WholeArray[16] = [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15]

Processor 0 got [0 , 1 , 2 , 3] , Processor 1 got [4 , 5 , 6 , 7] 

Processor 2 got [8 , 9 , 10 , 11] , Processor 3 got [12 , 13 , 14 , 15] 

以 2 个步幅发送和求和最后一个数组:

(stride 1)

Processor 0 send Array[3] , Processor 1 receive from Processor 0 and add to Array[3]

Processor 2 send Array[3], Processor 3 receive from Processor 2 and add to Array[3] 

(stride 2)

Processor 1 sends Array[3], Processor 3 receive from Processor 1 and add to Array[3]

最后我想用MPI_Gather让结果为:

WholeArray = [0 , 1 , 2 , 3 , 4 , 5 , 6 ,10 , 8 , 9 , 10 , 11 , 12 , 13 ,14 , 36]

我发现很难编写代码让程序像下面的 4nodes 示例那样:

(1st stride) - Processor 0 send to Processor 1 and Processor 1 receive from Processor 0
(1st stride) - Processor 2 send to Processor 3 and Processor 3 receive from Processor 2

(2nd stride) - Processor 1 send to Processor 3 and Processor 3 receive from Processor 1

这是我到目前为止编写的代码:

int Send_Receive(int* my_input, int size_per_process, int rank, int size)
{

    int key = 1;
    int temp = my_input[size_per_process-1];

    while(key <= size/2)
{
    if((rank+1) % key == 0)
      {
        if(rank/key % 2 == 0)
        {
            MPI_Send(&temp, 1, MPI_INT, rank+key,0,MPI_COMM_WORLD);
        }
        else
        {
            MPI_Recv(&temp, 1, MPI_INT, rank-key,0,MPI_COMM_WORLD,MPI_STATUS_IGNORE);
            my_input[size_per_process]+= temp;
        }
        key = 2 * key;
        MPI_Barrier(MPI_COMM_WORLD);
      }
}

return (*my_input);

}

您的代码中存在一些问题,即 1) 它总是跨进程发送相同的 temp 变量。

MPI_Send(&temp, 1, MPI_INT, rank+key,0,MPI_COMM_WORLD);

temp变量在循环前初始化:

 int temp = my_input[size_per_process-1];
 while(key <= size/2)
 { ...}

但从未在循环内更新。这会导致错误的结果,因为在第一个跨步之后,my_input 数组的最后一个元素对于某些进程来说将是不同的。相反,你应该这样做:

temp = localdata[size_per_process-1];
MPI_Send(&temp, 1, MPI_INT, rank+key, 0, MPI_COMM_WORLD);
                            

此外,2)下面的语句

my_input[size_per_process]+= temp;

不会将temp添加到数组my_input的最后位置。相反,它应该是:

my_input[size_per_process-1]+= temp;

最后,3) 存在死锁和死循环问题。对于初学者来说,在单个条件语句中调用集体通信例程(例如 MPI_barrier)通常是一个大危险信号。而不是:

while(key <= size/2)
{
   if((rank+1) % key == 0){
       ...
       MPI_Barrier(MPI_COMM_WORLD);
   }
}

你应该有:

while(key <= size/2)
{
   if((rank+1) % key == 0){
       ...
   }
   MPI_Barrier(MPI_COMM_WORLD);
}

确保每个进程调用MPI_Barrier

无限循环的发生是因为 while 条件取决于 key 的更新,但 key 仅在 if((rank+1) % key == 0) 计算为 [=31] 时更新=].因此,当 if((rank+1) % key == 0) 计算为 false 时,进程将永远不会更新 key,因此陷入无限循环。

我 运行 修复了所有问题的示例 :

#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>

int main(int argc, char **argv){
    int rank, mpisize, total_size = 16;
    MPI_Init(&argc, &argv);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    MPI_Comm_size(MPI_COMM_WORLD, &mpisize);
    int *data = NULL;   

    if(rank == 0){
       data = malloc(total_size * sizeof(int));
       for(int i = 0; i < total_size; i++)
          data[i] = i;
    }
    int size_per_process = total_size / mpisize;
    int *localdata = malloc(size_per_process * sizeof(int));
    MPI_Scatter(data, size_per_process, MPI_INT, localdata, size_per_process, MPI_INT, 0, MPI_COMM_WORLD);

    int key = 1;
    int temp = 0;
    while(key <= mpisize/2){                      
      if((rank+1) % key == 0){
          if(rank/key % 2 == 0){    
             temp = localdata[size_per_process-1];
             MPI_Send(&temp, 1, MPI_INT, rank+key, 0, MPI_COMM_WORLD);
          }
          else {
             MPI_Recv(&temp, 1, MPI_INT, rank-key, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
             localdata[size_per_process-1]+= temp;
         }
      }
      key = 2 * key;
      MPI_Barrier(MPI_COMM_WORLD);
    }

    MPI_Gather(localdata, size_per_process, MPI_INT, data, size_per_process, MPI_INT, 0, MPI_COMM_WORLD);

    if(rank == 0){
       for(int i = 0; i < total_size; i++)
               printf("%d ", data[i]);
       printf("\n");
    }
    free(data);
    free(localdata);    
    MPI_Finalize();
    return 0;
}

输入:

[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15]

输出:

[0,1,2,3,4,5,6,10,8,9,10,11,12,13,14,36]