这些退出代码对 MPI 程序意味着什么?
What do these exit code mean for a MPI program?
当我尝试 运行 MPI 程序但失败时。它说:
job aborted:
[ranks] message
[0] process exited without calling finalize
[1-3] terminated
错误分析说退出代码是0xc0000005
。
然后我 google 它,有人说用 MPI_Init_thread
代替,但它给了我 255
作为退出代码。
我该如何解决? 0 级进程有什么问题?
下面是使用MPI发送和接收数据的代码片段:
// MPI things
MPI_Comm_rank(MPI_COMM_WORLD, &taskid);
// master
if (taskid == 0)
{
//printf("taskid: %d", taskid);
average = Nchunk / Nworkers;
extra = Nchunk % Nworkers;
mtype = FROM_MASTER;
offset = 0;
// store volume[Itemp[n]]
for (int i = 0; i < Nchunk; i++)
{
volumeTemp[i] = volume[Itemp[i]];
}
// send to slave
for (int dest = 1; dest <= Nworkers; dest++)
{
Nelements = (dest <= extra) ? average + 1 : average;
MPI_Send(&Nelements, 1, MPI_INT, dest, mtype, MPI_COMM_WORLD);
MPI_Send(&offset, 1, MPI_INT, dest, mtype, MPI_COMM_WORLD);
MPI_Send(&Itemp[offset], Nelements, MPI_INT, dest, mtype, MPI_COMM_WORLD);
MPI_Send(&SMtemp[offset], Nelements, MPI_FLOAT, dest, mtype, MPI_COMM_WORLD);
MPI_Send(&volumeTemp[offset], Nelements, MPI_FLOAT, dest, mtype, MPI_COMM_WORLD);
offset = offset + Nelements;
}
// receive result from slave
mtype = FROM_WORKERS;
for (int source = 1; source <= Nworkers; source++)
{
//MPI_Recv(&average, 1, MPI_INT, source, mtype, MPI_COMM_WORLD, &status);
//MPI_Recv(&offset, 1, MPI_INT, source, mtype, MPI_COMM_WORLD, &status);
MPI_Recv(&sinogram[ns], 1, MPI_FLOAT, source, mtype, MPI_COMM_WORLD, &status);
}
}
//printf("taskid: %d", taskid);
// slave
if (taskid > 0)
{
//printf("taskid: %d", taskid);
mtype = FROM_MASTER;
MPI_Recv(&Nelements, 1, MPI_INT, MASTER, mtype, MPI_COMM_WORLD, &status);
MPI_Recv(&offset, 1, MPI_INT, MASTER, mtype, MPI_COMM_WORLD, &status);
MPI_Recv(&Itemp[offset], Nelements, MPI_INT, MASTER, mtype, MPI_COMM_WORLD, &status);
MPI_Recv(&SMtemp[offset], Nelements, MPI_INT, MASTER, mtype, MPI_COMM_WORLD, &status);
MPI_Recv(&volumeTemp, Nelements, MPI_FLOAT, MASTER, mtype, MPI_COMM_WORLD, &status);
for (int i = 0; i < average; i++)
{
if (fabs(volumeTemp[i]) > 1.0e-14)
sinogram[ns] = sinogram[ns] + volumeTemp[i] * SMtemp[i];
}
//send to master
mtype = FROM_WORKERS;
MPI_Send(&sinogram[ns], 1, MPI_FLOAT, MASTER, mtype, MPI_COMM_WORLD, &status);
}
MPI 的退出代码几乎没有任何意义,因为您有多个进程都返回自己的错误代码。依赖程序吐出的错误消息会更有帮助。幸运的是,您的程序做到了!
[0] process exited without calling finalize
这可能意味着两件事之一;
- 您的程序已完成,但没有调用
MPI_Finalize
。这是一个很容易解决的问题。检查以确保您的程序在任何地方都能正常终止,它会调用 MPI_Finalize
。这可能是也可能不是你的问题...
- 您的程序异常终止。这通常更难追踪,可能需要一些常用的 MPI debugging tricks. We're probably not going to be able to fix your problem here if that's the problem unless your code is trivially small or you follow the guidelines on creating a good example.
当我尝试 运行 MPI 程序但失败时。它说:
job aborted:
[ranks] message
[0] process exited without calling finalize
[1-3] terminated
错误分析说退出代码是0xc0000005
。
然后我 google 它,有人说用 MPI_Init_thread
代替,但它给了我 255
作为退出代码。
我该如何解决? 0 级进程有什么问题?
下面是使用MPI发送和接收数据的代码片段:
// MPI things
MPI_Comm_rank(MPI_COMM_WORLD, &taskid);
// master
if (taskid == 0)
{
//printf("taskid: %d", taskid);
average = Nchunk / Nworkers;
extra = Nchunk % Nworkers;
mtype = FROM_MASTER;
offset = 0;
// store volume[Itemp[n]]
for (int i = 0; i < Nchunk; i++)
{
volumeTemp[i] = volume[Itemp[i]];
}
// send to slave
for (int dest = 1; dest <= Nworkers; dest++)
{
Nelements = (dest <= extra) ? average + 1 : average;
MPI_Send(&Nelements, 1, MPI_INT, dest, mtype, MPI_COMM_WORLD);
MPI_Send(&offset, 1, MPI_INT, dest, mtype, MPI_COMM_WORLD);
MPI_Send(&Itemp[offset], Nelements, MPI_INT, dest, mtype, MPI_COMM_WORLD);
MPI_Send(&SMtemp[offset], Nelements, MPI_FLOAT, dest, mtype, MPI_COMM_WORLD);
MPI_Send(&volumeTemp[offset], Nelements, MPI_FLOAT, dest, mtype, MPI_COMM_WORLD);
offset = offset + Nelements;
}
// receive result from slave
mtype = FROM_WORKERS;
for (int source = 1; source <= Nworkers; source++)
{
//MPI_Recv(&average, 1, MPI_INT, source, mtype, MPI_COMM_WORLD, &status);
//MPI_Recv(&offset, 1, MPI_INT, source, mtype, MPI_COMM_WORLD, &status);
MPI_Recv(&sinogram[ns], 1, MPI_FLOAT, source, mtype, MPI_COMM_WORLD, &status);
}
}
//printf("taskid: %d", taskid);
// slave
if (taskid > 0)
{
//printf("taskid: %d", taskid);
mtype = FROM_MASTER;
MPI_Recv(&Nelements, 1, MPI_INT, MASTER, mtype, MPI_COMM_WORLD, &status);
MPI_Recv(&offset, 1, MPI_INT, MASTER, mtype, MPI_COMM_WORLD, &status);
MPI_Recv(&Itemp[offset], Nelements, MPI_INT, MASTER, mtype, MPI_COMM_WORLD, &status);
MPI_Recv(&SMtemp[offset], Nelements, MPI_INT, MASTER, mtype, MPI_COMM_WORLD, &status);
MPI_Recv(&volumeTemp, Nelements, MPI_FLOAT, MASTER, mtype, MPI_COMM_WORLD, &status);
for (int i = 0; i < average; i++)
{
if (fabs(volumeTemp[i]) > 1.0e-14)
sinogram[ns] = sinogram[ns] + volumeTemp[i] * SMtemp[i];
}
//send to master
mtype = FROM_WORKERS;
MPI_Send(&sinogram[ns], 1, MPI_FLOAT, MASTER, mtype, MPI_COMM_WORLD, &status);
}
MPI 的退出代码几乎没有任何意义,因为您有多个进程都返回自己的错误代码。依赖程序吐出的错误消息会更有帮助。幸运的是,您的程序做到了!
[0] process exited without calling finalize
这可能意味着两件事之一;
- 您的程序已完成,但没有调用
MPI_Finalize
。这是一个很容易解决的问题。检查以确保您的程序在任何地方都能正常终止,它会调用MPI_Finalize
。这可能是也可能不是你的问题... - 您的程序异常终止。这通常更难追踪,可能需要一些常用的 MPI debugging tricks. We're probably not going to be able to fix your problem here if that's the problem unless your code is trivially small or you follow the guidelines on creating a good example.