用 C 编写我自己的 MPI_Allreduce 版本 - 为什么我的代码会无限期挂起?

Writing my own version of MPI_Allreduce in C - why does my code hang indefinitely?

我正在尝试用 C 编写我自己的 MPI_Allreduce 版本,但仅适用于 2 的幂大小,即 size = 2,4,8,16,... 并且仅适用于 MPI_INT 数据类型。到目前为止我的代码是:

 72 int tree_allreduce(const int *sendbuf, int *recvbuf, int count, MPI_Op op, MPI_Comm comm){
 73 
 74   // Create variables for rank and size
 75   int rank, size;
 76   MPI_Comm_rank(comm, &rank);
 77   MPI_Comm_size(comm, &size);
 78 
 79   // While size is greater than 1 there is 2 or more ranks to operate on
 80   while(size > 1){  // While loop active until size=1 when only process remaining is rank 0
 81     if(rank < size){  // Filter out odd ranks which are always bigger than size after sending their data to their left even recvbuffer
 82       if( (rank % 2) != 0 ){ // If rank is odd
 83         MPI_Send(sendbuf, count, MPI_INT, rank-1, rank, comm);  // Send contents of the sendbuf to the recvbuf, using rank of odd process as tag
 84         rank *= size;  // multiplying odd ranks by sizes ensures they are always > or = size when the if(rank < size) comes from next while iteration
 85       }
 86       else{  // If rank is even
 87         // For an even rank, the values for the even number is stored in sendbuf, and the values of the odd rank is stored in recvbuf.
 88         MPI_Recv(recvbuf, count, MPI_INT, rank+1, rank+1, comm, MPI_STATUS_IGNORE);  // Receive contents of sendbuf from rank+1 into recvbuf
 89         rank /= 2;  // Half the rank so for next iteration of while loop rank 0 --> rank 0, rank 2 --> rank 1, rank 4 --> rank 2, etc...
 90         MPI_Reduce_local(sendbuf, recvbuf, count, MPI_INT, op);  // Use MPI_Reduce_local to do SUM/PROD/MIN/MAX operations and return result into recvbuf
 91       }
 92     }
 93     size /= 2;  // Half the size to reflect the processes contracting pairwise
 94   }
 95 
 96   // Broadcast result back to all processes
 97   MPI_Bcast(recvbuf, count, MPI_INT, 0, comm);
 98 
 99   return 0;
100 }

这对于 2 号来说工作正常,但是对于任何更大的号,代码会无限期地挂起,我似乎无法弄清楚为什么。我想我犯了一些新手 MPI 错误,所以请让我知道我哪里出了问题。

假设您有 8 个处理器(rank var 存储在您的变量 rank 中,rank acrual 是实际的工人等级)。

rank var   |01234567
rank actual|01234567

第一次迭代工作正常,数据根据 to 方案发送

0 1 2 3 4 5 6 7
rcv(1) snd(0) rcv(3) snd(2) rcv(5) snd(4) rcv(7) snd(6)

之后,您通过行 rank *= size 删除奇数工人,并更新等级变量 rank /= 2

rank var   |0_1_2_3_
rank actual|01234567

下一次迭代数据根据方案发送

0 - 2 - 4 - 6 -
rcv(1) - snd(0) - rcv(3) - snd(2) -

如您所见,它一团糟。工作人员等待未发送给他们的数据。