CPU 超过时间限制:当 MPI_Sent 一个非常大的 int*
CPU time limit exceeded: When MPI_Sent a very large int*
当我想通过 MPI_Send 发送非常大的消息时遇到了一个问题:有多个处理器,我们需要传输的 int 总数是 2^25,我测试了那个大小在 1000 中,我的代码运行良好,但如果我将它设置为教授要求的大小,它会卡住很长时间,return 我得到一些这样的信息:
2 more processes have sent help message help-mpi-btl-base.txt / btl:no-nics
Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
mpiexec noticed that process rank 0 with PID 0 on node srv-p22-13 exited on signal 24 (CPU time limit exceeded).
我在每行代码之后都用了"cout",我确定它卡在了MPI_Send行之前,Si的大小超过了20,000,000。我不确定是这个原因吗?但是我搜索过 MPI_Send 的最大限制是 2^32-1...它比 2^25 大...所以我感到困惑。
这是我的代码的主要部分:
//This is send part
for(int i=0; i<5; i++){
if(i!=my_rank){//my_rank is from MPI_Comm_rank(MPI_COMM_WORLD, &my_rank)
int n = A.size();//A is a vector of int
int* Si= new int[n];//I want to convert vector to a int array
std::copy(A.begin(),A.end(),Si);
MPI_Send(&Si, n, Type, i, my_rank ,MPI_COMM_WORLD);//**The code stuck here and says CPU time limit exceeded
delete[] Si;
}
}
MPI_Barrier(MPI_COMM_WORLD);//I want all the processor finish sending part, then start receive and save in vector
//This is receive part
for(int i=0; i<5; i++){
if(i!=my_rank){
MPI_Status status;
MPI_Probe(i,i,MPI_COMM_WORLD,&status);
int rn = 0;
MPI_Get_count(&status, Type, &rn);
int* Ri = new int[rn];
MPI_Recv(Ri, rn, Type, i, i, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
/*Save received elements into vector A*/
for(int i=0; i<sizeof(Ri);i++){
inout.push_back(A);
}
}
}
非常感谢@jacob 和我分享了一个类似的问题link,看完之后,我知道我犯了同样的错误:处理器不能在同一时间全部发送,所以我使用MPI_Sendrecv 参考这个问题:MPI hangs on MPI_Send for large messages
当我想通过 MPI_Send 发送非常大的消息时遇到了一个问题:有多个处理器,我们需要传输的 int 总数是 2^25,我测试了那个大小在 1000 中,我的代码运行良好,但如果我将它设置为教授要求的大小,它会卡住很长时间,return 我得到一些这样的信息:
2 more processes have sent help message help-mpi-btl-base.txt / btl:no-nics
Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
mpiexec noticed that process rank 0 with PID 0 on node srv-p22-13 exited on signal 24 (CPU time limit exceeded).
我在每行代码之后都用了"cout",我确定它卡在了MPI_Send行之前,Si的大小超过了20,000,000。我不确定是这个原因吗?但是我搜索过 MPI_Send 的最大限制是 2^32-1...它比 2^25 大...所以我感到困惑。
这是我的代码的主要部分:
//This is send part
for(int i=0; i<5; i++){
if(i!=my_rank){//my_rank is from MPI_Comm_rank(MPI_COMM_WORLD, &my_rank)
int n = A.size();//A is a vector of int
int* Si= new int[n];//I want to convert vector to a int array
std::copy(A.begin(),A.end(),Si);
MPI_Send(&Si, n, Type, i, my_rank ,MPI_COMM_WORLD);//**The code stuck here and says CPU time limit exceeded
delete[] Si;
}
}
MPI_Barrier(MPI_COMM_WORLD);//I want all the processor finish sending part, then start receive and save in vector
//This is receive part
for(int i=0; i<5; i++){
if(i!=my_rank){
MPI_Status status;
MPI_Probe(i,i,MPI_COMM_WORLD,&status);
int rn = 0;
MPI_Get_count(&status, Type, &rn);
int* Ri = new int[rn];
MPI_Recv(Ri, rn, Type, i, i, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
/*Save received elements into vector A*/
for(int i=0; i<sizeof(Ri);i++){
inout.push_back(A);
}
}
}
非常感谢@jacob 和我分享了一个类似的问题link,看完之后,我知道我犯了同样的错误:处理器不能在同一时间全部发送,所以我使用MPI_Sendrecv 参考这个问题:MPI hangs on MPI_Send for large messages