在 C 中使用 MPI_Isend 发送多个非阻塞消息并使用 MPI_Recv 接收问题
problem sending multiple non-blocking messages with MPI_Isend and receiving with MPI_Recv in C
我正在编写并行算法,但遇到非阻塞通信问题。我通过以下代码对我的问题进行建模:
int main( int argc, char* argv[] ) {
MPI_Init(&argc, &argv);
int rank, p;
MPI_Comm_rank( MPI_COMM_WORLD, &rank );
MPI_Comm_size( MPI_COMM_WORLD, &p );
int a,b, i, j;
int maxNumber = 8192;
int(*tab)[maxNumber] = malloc(sizeof(int[maxNumber + 1][maxNumber + 1]));
MPI_Request* r = malloc(sizeof * r);
if(rank == 0){
for(i = 0; i < maxNumber + 1; i++){
for(j = 0; j < maxNumber + 1; j++){
tab[i][j] = 2*i+i*j;
}
for(a = 1; a < p; a++){
MPI_Isend(&tab[i], maxNumber + 1, MPI_INT, a, i, MPI_COMM_WORLD, r);
printf("Process 0 send the block %d to process %d\n", i, a);
}
}
}
else{
for(i = 1; i < p; i++){
if(rank == i){
for(j = 0; j < maxNumber + 1; j++){
printf("Process %d wait the block %d to process 0\n", i, j);
MPI_Recv(&tab[j], maxNumber + 1, MPI_INT, 0, j, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
printf("Process %d receive the block %d to process 0\n", i, j);
}
}
}
}
MPI_Finalize();
return 0;
}
处理器0在一些计算之后将大小为8192 * 8192的矩阵的每一行发送给其他处理器。问题是处理器 0 在其他处理器接收数据之前完成发送 8192 行。
这是输出的一部分:
...
...
Process 0 send the block 8187 to process 1
Process 0 send the block 8188 to process 1
Process 0 send the block 8189 to process 1
Process 0 send the block 8190 to process 1
Process 0 send the block 8191 to process 1
Process 0 send the block 8192 to process 1
Process 1 receive the block 5 to process 0
Process 1 wait the block 6 to process 0
Process 1 receive the block 6 to process 0
Process 1 wait the block 7 to process 0
Process 1 receive the block 7 to process 0
Process 1 wait the block 8 to process 0
Process 1 receive the block 8 to process 0
Process 1 wait the block 9 to process 0
Process 1 receive the block 9 to process 0
...
...
PS:发送的通信必须是非阻塞的,因为在我的问题中,进程 0 在每次迭代中以 O(n²/p²) 计算,然后按顺序发送到其他处理器他们尽快开始计算。
请问您知道我能做些什么来解决这个问题吗?
谢谢@Gilles 的回答。它可以让我解决我的问题。我需要使用 MPI_Ibsend 来分配所需数量的缓冲区 space,数据可以被复制到其中直到它被传送。
int main( int argc, char* argv[] ) {
MPI_Init(&argc, &argv);
int rank, p;
MPI_Comm_rank( MPI_COMM_WORLD, &rank );
MPI_Comm_size( MPI_COMM_WORLD, &p );
int a, i, j;
int maxNumber = atoi(argv[1]);
int(*tab)[maxNumber] = malloc(sizeof(int[maxNumber + 1][maxNumber + 1]));
MPI_Request* tabReq = malloc(maxNumber * sizeof * tabReq);
int bufsize = maxNumber * maxNumber;
char *buf = malloc( bufsize );
if(rank == 0){
for(i = 0; i < maxNumber + 1; i++){
for(j = 0; j < maxNumber + 1; j++){
tab[i][j] = 2*i+i*j;
}
for(a = 1; a < p; a++){
MPI_Buffer_attach( buf, bufsize );
MPI_Ibsend(&tab[i], maxNumber + 1, MPI_INT, a, i, MPI_COMM_WORLD, &tabReq[i]);
MPI_Buffer_detach( &buf, &bufsize );
printf("Process 0 send the block %d to process %d\n", i, a);
}
}
}
else{
for(j = 0; j < maxNumber + 1; j++){
printf("Process %d wait the block %d to process 0\n", rank, j);
MPI_Recv(&tab[j], maxNumber + 1, MPI_INT, 0, j, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
printf("Process %d receive the block %d to process 0\n", rank, j);
}
}
MPI_Finalize();
return 0;
}
这是输出的一部分:
...
...
Process 1 wait the block 8186 to process 0
Process 0 send the block 8185 to process 1
Process 1 receive the block 8186 to process 0
Process 1 wait the block 8187 to process 0
Process 0 send the block 8186 to process 1
Process 1 receive the block 8187 to process 0
Process 1 wait the block 8188 to process 0
Process 0 send the block 8187 to process 1
Process 1 receive the block 8188 to process 0
Process 1 wait the block 8189 to process 0
Process 0 send the block 8188 to process 1
Process 1 receive the block 8189 to process 0
Process 1 wait the block 8190 to process 0
Process 0 send the block 8189 to process 1
Process 1 receive the block 8190 to process 0
Process 1 wait the block 8191 to process 0
Process 0 send the block 8190 to process 1
Process 1 receive the block 8191 to process 0
Process 1 wait the block 8192 to process 0
Process 0 send the block 8191 to process 1
Process 1 receive the block 8192 to process 0
Process 0 send the block 8192 to process 1
我正在编写并行算法,但遇到非阻塞通信问题。我通过以下代码对我的问题进行建模:
int main( int argc, char* argv[] ) {
MPI_Init(&argc, &argv);
int rank, p;
MPI_Comm_rank( MPI_COMM_WORLD, &rank );
MPI_Comm_size( MPI_COMM_WORLD, &p );
int a,b, i, j;
int maxNumber = 8192;
int(*tab)[maxNumber] = malloc(sizeof(int[maxNumber + 1][maxNumber + 1]));
MPI_Request* r = malloc(sizeof * r);
if(rank == 0){
for(i = 0; i < maxNumber + 1; i++){
for(j = 0; j < maxNumber + 1; j++){
tab[i][j] = 2*i+i*j;
}
for(a = 1; a < p; a++){
MPI_Isend(&tab[i], maxNumber + 1, MPI_INT, a, i, MPI_COMM_WORLD, r);
printf("Process 0 send the block %d to process %d\n", i, a);
}
}
}
else{
for(i = 1; i < p; i++){
if(rank == i){
for(j = 0; j < maxNumber + 1; j++){
printf("Process %d wait the block %d to process 0\n", i, j);
MPI_Recv(&tab[j], maxNumber + 1, MPI_INT, 0, j, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
printf("Process %d receive the block %d to process 0\n", i, j);
}
}
}
}
MPI_Finalize();
return 0;
}
处理器0在一些计算之后将大小为8192 * 8192的矩阵的每一行发送给其他处理器。问题是处理器 0 在其他处理器接收数据之前完成发送 8192 行。
这是输出的一部分:
...
...
Process 0 send the block 8187 to process 1
Process 0 send the block 8188 to process 1
Process 0 send the block 8189 to process 1
Process 0 send the block 8190 to process 1
Process 0 send the block 8191 to process 1
Process 0 send the block 8192 to process 1
Process 1 receive the block 5 to process 0
Process 1 wait the block 6 to process 0
Process 1 receive the block 6 to process 0
Process 1 wait the block 7 to process 0
Process 1 receive the block 7 to process 0
Process 1 wait the block 8 to process 0
Process 1 receive the block 8 to process 0
Process 1 wait the block 9 to process 0
Process 1 receive the block 9 to process 0
...
...
PS:发送的通信必须是非阻塞的,因为在我的问题中,进程 0 在每次迭代中以 O(n²/p²) 计算,然后按顺序发送到其他处理器他们尽快开始计算。
请问您知道我能做些什么来解决这个问题吗?
谢谢@Gilles 的回答。它可以让我解决我的问题。我需要使用 MPI_Ibsend 来分配所需数量的缓冲区 space,数据可以被复制到其中直到它被传送。
int main( int argc, char* argv[] ) {
MPI_Init(&argc, &argv);
int rank, p;
MPI_Comm_rank( MPI_COMM_WORLD, &rank );
MPI_Comm_size( MPI_COMM_WORLD, &p );
int a, i, j;
int maxNumber = atoi(argv[1]);
int(*tab)[maxNumber] = malloc(sizeof(int[maxNumber + 1][maxNumber + 1]));
MPI_Request* tabReq = malloc(maxNumber * sizeof * tabReq);
int bufsize = maxNumber * maxNumber;
char *buf = malloc( bufsize );
if(rank == 0){
for(i = 0; i < maxNumber + 1; i++){
for(j = 0; j < maxNumber + 1; j++){
tab[i][j] = 2*i+i*j;
}
for(a = 1; a < p; a++){
MPI_Buffer_attach( buf, bufsize );
MPI_Ibsend(&tab[i], maxNumber + 1, MPI_INT, a, i, MPI_COMM_WORLD, &tabReq[i]);
MPI_Buffer_detach( &buf, &bufsize );
printf("Process 0 send the block %d to process %d\n", i, a);
}
}
}
else{
for(j = 0; j < maxNumber + 1; j++){
printf("Process %d wait the block %d to process 0\n", rank, j);
MPI_Recv(&tab[j], maxNumber + 1, MPI_INT, 0, j, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
printf("Process %d receive the block %d to process 0\n", rank, j);
}
}
MPI_Finalize();
return 0;
}
这是输出的一部分:
...
...
Process 1 wait the block 8186 to process 0
Process 0 send the block 8185 to process 1
Process 1 receive the block 8186 to process 0
Process 1 wait the block 8187 to process 0
Process 0 send the block 8186 to process 1
Process 1 receive the block 8187 to process 0
Process 1 wait the block 8188 to process 0
Process 0 send the block 8187 to process 1
Process 1 receive the block 8188 to process 0
Process 1 wait the block 8189 to process 0
Process 0 send the block 8188 to process 1
Process 1 receive the block 8189 to process 0
Process 1 wait the block 8190 to process 0
Process 0 send the block 8189 to process 1
Process 1 receive the block 8190 to process 0
Process 1 wait the block 8191 to process 0
Process 0 send the block 8190 to process 1
Process 1 receive the block 8191 to process 0
Process 1 wait the block 8192 to process 0
Process 0 send the block 8191 to process 1
Process 1 receive the block 8192 to process 0
Process 0 send the block 8192 to process 1