用 C 编写我自己的 MPI_Allreduce 版本 - 为什么我的代码会无限期挂起？

Question

我正在尝试用 C 编写我自己的 MPI_Allreduce 版本，但仅适用于 2 的幂大小，即 size = 2,4,8,16,... 并且仅适用于 MPI_INT 数据类型。到目前为止我的代码是：

 72 int tree_allreduce(const int *sendbuf, int *recvbuf, int count, MPI_Op op, MPI_Comm comm){
 73 
 74   // Create variables for rank and size
 75   int rank, size;
 76   MPI_Comm_rank(comm, &rank);
 77   MPI_Comm_size(comm, &size);
 78 
 79   // While size is greater than 1 there is 2 or more ranks to operate on
 80   while(size > 1){  // While loop active until size=1 when only process remaining is rank 0
 81     if(rank < size){  // Filter out odd ranks which are always bigger than size after sending their data to their left even recvbuffer
 82       if( (rank % 2) != 0 ){ // If rank is odd
 83         MPI_Send(sendbuf, count, MPI_INT, rank-1, rank, comm);  // Send contents of the sendbuf to the recvbuf, using rank of odd process as tag
 84         rank *= size;  // multiplying odd ranks by sizes ensures they are always > or = size when the if(rank < size) comes from next while iteration
 85       }
 86       else{  // If rank is even
 87         // For an even rank, the values for the even number is stored in sendbuf, and the values of the odd rank is stored in recvbuf.
 88         MPI_Recv(recvbuf, count, MPI_INT, rank+1, rank+1, comm, MPI_STATUS_IGNORE);  // Receive contents of sendbuf from rank+1 into recvbuf
 89         rank /= 2;  // Half the rank so for next iteration of while loop rank 0 --> rank 0, rank 2 --> rank 1, rank 4 --> rank 2, etc...
 90         MPI_Reduce_local(sendbuf, recvbuf, count, MPI_INT, op);  // Use MPI_Reduce_local to do SUM/PROD/MIN/MAX operations and return result into recvbuf
 91       }
 92     }
 93     size /= 2;  // Half the size to reflect the processes contracting pairwise
 94   }
 95 
 96   // Broadcast result back to all processes
 97   MPI_Bcast(recvbuf, count, MPI_INT, 0, comm);
 98 
 99   return 0;
100 }

这对于 2 号来说工作正常，但是对于任何更大的号，代码会无限期地挂起，我似乎无法弄清楚为什么。我想我犯了一些新手 MPI 错误，所以请让我知道我哪里出了问题。

Answer 1

假设您有 8 个处理器（rank var 存储在您的变量 rank 中，rank acrual 是实际的工人等级）。

rank var   |01234567
rank actual|01234567

第一次迭代工作正常，数据根据 to 方案发送

0	1	2	3	4	5	6	7
rcv(1)	snd(0)	rcv(3)	snd(2)	rcv(5)	snd(4)	rcv(7)	snd(6)

之后，您通过行 rank *= size 删除奇数工人，并更新等级变量 rank /= 2

rank var   |0_1_2_3_
rank actual|01234567

下一次迭代数据根据方案发送

0	-	2	-	4	-	6	-
rcv(1)	-	snd(0)	-	rcv(3)	-	snd(2)	-

如您所见，它一团糟。工作人员等待未发送给他们的数据。

用 C 编写我自己的 MPI_Allreduce 版本 - 为什么我的代码会无限期挂起？

Writing my own version of MPI_Allreduce in C - why does my code hang indefinitely?

c

mpi