用 C 编写我自己的 MPI_Allreduce 版本 - 为什么我的代码会无限期挂起?
Writing my own version of MPI_Allreduce in C - why does my code hang indefinitely?
我正在尝试用 C 编写我自己的 MPI_Allreduce 版本,但仅适用于 2 的幂大小,即 size = 2,4,8,16,... 并且仅适用于 MPI_INT 数据类型。到目前为止我的代码是:
72 int tree_allreduce(const int *sendbuf, int *recvbuf, int count, MPI_Op op, MPI_Comm comm){
73
74 // Create variables for rank and size
75 int rank, size;
76 MPI_Comm_rank(comm, &rank);
77 MPI_Comm_size(comm, &size);
78
79 // While size is greater than 1 there is 2 or more ranks to operate on
80 while(size > 1){ // While loop active until size=1 when only process remaining is rank 0
81 if(rank < size){ // Filter out odd ranks which are always bigger than size after sending their data to their left even recvbuffer
82 if( (rank % 2) != 0 ){ // If rank is odd
83 MPI_Send(sendbuf, count, MPI_INT, rank-1, rank, comm); // Send contents of the sendbuf to the recvbuf, using rank of odd process as tag
84 rank *= size; // multiplying odd ranks by sizes ensures they are always > or = size when the if(rank < size) comes from next while iteration
85 }
86 else{ // If rank is even
87 // For an even rank, the values for the even number is stored in sendbuf, and the values of the odd rank is stored in recvbuf.
88 MPI_Recv(recvbuf, count, MPI_INT, rank+1, rank+1, comm, MPI_STATUS_IGNORE); // Receive contents of sendbuf from rank+1 into recvbuf
89 rank /= 2; // Half the rank so for next iteration of while loop rank 0 --> rank 0, rank 2 --> rank 1, rank 4 --> rank 2, etc...
90 MPI_Reduce_local(sendbuf, recvbuf, count, MPI_INT, op); // Use MPI_Reduce_local to do SUM/PROD/MIN/MAX operations and return result into recvbuf
91 }
92 }
93 size /= 2; // Half the size to reflect the processes contracting pairwise
94 }
95
96 // Broadcast result back to all processes
97 MPI_Bcast(recvbuf, count, MPI_INT, 0, comm);
98
99 return 0;
100 }
这对于 2 号来说工作正常,但是对于任何更大的号,代码会无限期地挂起,我似乎无法弄清楚为什么。我想我犯了一些新手 MPI 错误,所以请让我知道我哪里出了问题。
假设您有 8 个处理器(rank var
存储在您的变量 rank
中,rank acrual
是实际的工人等级)。
rank var |01234567
rank actual|01234567
第一次迭代工作正常,数据根据 to 方案发送
0
1
2
3
4
5
6
7
rcv(1)
snd(0)
rcv(3)
snd(2)
rcv(5)
snd(4)
rcv(7)
snd(6)
之后,您通过行 rank *= size
删除奇数工人,并更新等级变量 rank /= 2
rank var |0_1_2_3_
rank actual|01234567
下一次迭代数据根据方案发送
0
-
2
-
4
-
6
-
rcv(1)
-
snd(0)
-
rcv(3)
-
snd(2)
-
如您所见,它一团糟。工作人员等待未发送给他们的数据。
我正在尝试用 C 编写我自己的 MPI_Allreduce 版本,但仅适用于 2 的幂大小,即 size = 2,4,8,16,... 并且仅适用于 MPI_INT 数据类型。到目前为止我的代码是:
72 int tree_allreduce(const int *sendbuf, int *recvbuf, int count, MPI_Op op, MPI_Comm comm){
73
74 // Create variables for rank and size
75 int rank, size;
76 MPI_Comm_rank(comm, &rank);
77 MPI_Comm_size(comm, &size);
78
79 // While size is greater than 1 there is 2 or more ranks to operate on
80 while(size > 1){ // While loop active until size=1 when only process remaining is rank 0
81 if(rank < size){ // Filter out odd ranks which are always bigger than size after sending their data to their left even recvbuffer
82 if( (rank % 2) != 0 ){ // If rank is odd
83 MPI_Send(sendbuf, count, MPI_INT, rank-1, rank, comm); // Send contents of the sendbuf to the recvbuf, using rank of odd process as tag
84 rank *= size; // multiplying odd ranks by sizes ensures they are always > or = size when the if(rank < size) comes from next while iteration
85 }
86 else{ // If rank is even
87 // For an even rank, the values for the even number is stored in sendbuf, and the values of the odd rank is stored in recvbuf.
88 MPI_Recv(recvbuf, count, MPI_INT, rank+1, rank+1, comm, MPI_STATUS_IGNORE); // Receive contents of sendbuf from rank+1 into recvbuf
89 rank /= 2; // Half the rank so for next iteration of while loop rank 0 --> rank 0, rank 2 --> rank 1, rank 4 --> rank 2, etc...
90 MPI_Reduce_local(sendbuf, recvbuf, count, MPI_INT, op); // Use MPI_Reduce_local to do SUM/PROD/MIN/MAX operations and return result into recvbuf
91 }
92 }
93 size /= 2; // Half the size to reflect the processes contracting pairwise
94 }
95
96 // Broadcast result back to all processes
97 MPI_Bcast(recvbuf, count, MPI_INT, 0, comm);
98
99 return 0;
100 }
这对于 2 号来说工作正常,但是对于任何更大的号,代码会无限期地挂起,我似乎无法弄清楚为什么。我想我犯了一些新手 MPI 错误,所以请让我知道我哪里出了问题。
假设您有 8 个处理器(rank var
存储在您的变量 rank
中,rank acrual
是实际的工人等级)。
rank var |01234567
rank actual|01234567
第一次迭代工作正常,数据根据 to 方案发送
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
---|---|---|---|---|---|---|---|
rcv(1) | snd(0) | rcv(3) | snd(2) | rcv(5) | snd(4) | rcv(7) | snd(6) |
之后,您通过行 rank *= size
删除奇数工人,并更新等级变量 rank /= 2
rank var |0_1_2_3_
rank actual|01234567
下一次迭代数据根据方案发送
0 | - | 2 | - | 4 | - | 6 | - |
---|---|---|---|---|---|---|---|
rcv(1) | - | snd(0) | - | rcv(3) | - | snd(2) | - |
如您所见,它一团糟。工作人员等待未发送给他们的数据。