MPI Send 给出分段错误
MPI Send is giving segmentation fault
我正在尝试 运行 具有 MPI(提升)的遗传算法,其中我必须将序列化对象从等级 0 发送到所有其他等级。但是当我尝试发送数据时出现分段错误错误。
这是我得到的代码、输出和错误。
代码:问题就在world.send(0, 0, newP);
int main (int argc, char** argv)
{
Population *pop = NULL;
RuckSack r(true);
int size, rank;
Ga ga;
namespace mpi = boost::mpi;
mpi::environment env;
mpi::communicator world;
int countGeneration = 0;
/* code */
if (world.rank() == 0)
{
if (pop == NULL)
{
pop = new Population(60,true);
}
}
for (int m = 0; m < 20; m++)
{
/* code */
for (int i = 0; i< world.size(); i++)
{
world.send(i,0,pop);
}
world.recv(0, 0, pop);
Population newP = *pop;
newP = ga.evolvePopulation(newP, world.size());
world.send(0, 0, newP);
MPI_Finalize();
return (EXIT_SUCCESS);
}
错误:
mpirun noticed that process rank 0 with PID 10336 on node user exited on signal 11 (Segmentation fault).
输出:
[user:10336] *** Process received signal ***
[user:10336] Signal: Segmentation fault (11)
[user:10336] Signal code: Address not mapped (1)
[user:10336] Failing at address: 0x31
[user:10336] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x35860)[0x7f1e93064860]
[user:10336] [ 1] /usr/lib/x86_64-linux-gnu/libboost_serialization.so.1.61.0(+0x14a24)[0x7f1e9409da24]
[user:10336] [ 2] /usr/lib/x86_64-linux-gnu/libboost_serialization.so.1.61.0(+0x15d11)[0x7f1e9409ed11]
[user:10336] [ 3] ./teste(+0x1de7c)[0x55ab4c07ae7c]
[user:10336] [ 4] ./teste(+0x1dd2c)[0x55ab4c07ad2c]
[user:10336] [ 5] ./teste(+0x1db3a)[0x55ab4c07ab3a]
[user:10336] [ 6] ./teste(+0x1d8eb)[0x55ab4c07a8eb]
[user:10336] [ 7] ./teste(+0x1d2da)[0x55ab4c07a2da]
[user:10336] [ 8] ./teste(+0x1cb20)[0x55ab4c079b20]
[user:10336] [ 9] ./teste(+0x1bed0)[0x55ab4c078ed0]
[user:10336] [10] ./teste(+0x1b47c)[0x55ab4c07847c]
[user:10336] [11] ./teste(+0x19741)[0x55ab4c076741]
[user:10336] [12] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1)[0x7f1e9304f3f1]
[user:10336] [13] ./teste(+0x112aa)[0x55ab4c06e2aa]
[user:10336] *** End of error message ***
以下是一些大胆的猜测:
- 您应该只在 rank0 进程上执行初始发送指令 - 现在您在所有没有意义的进程中执行它(并且可能是问题的原因)
- 您不应发送至 "self"。在你循环的第一次迭代中,rank0 向自己发送,afaik 将阻止等待 recv 的进程。但是由于 rank0 被阻止,它永远不会到达 'recv' 行并且将永远保持锁定状态。除此之外,进程向自身发送数据同样没有意义。
这些只是松散的建议,因为我在使用 MPI 方面的经验有限。希望对您有所帮助!
我正在尝试 运行 具有 MPI(提升)的遗传算法,其中我必须将序列化对象从等级 0 发送到所有其他等级。但是当我尝试发送数据时出现分段错误错误。
这是我得到的代码、输出和错误。
代码:问题就在world.send(0, 0, newP);
int main (int argc, char** argv)
{
Population *pop = NULL;
RuckSack r(true);
int size, rank;
Ga ga;
namespace mpi = boost::mpi;
mpi::environment env;
mpi::communicator world;
int countGeneration = 0;
/* code */
if (world.rank() == 0)
{
if (pop == NULL)
{
pop = new Population(60,true);
}
}
for (int m = 0; m < 20; m++)
{
/* code */
for (int i = 0; i< world.size(); i++)
{
world.send(i,0,pop);
}
world.recv(0, 0, pop);
Population newP = *pop;
newP = ga.evolvePopulation(newP, world.size());
world.send(0, 0, newP);
MPI_Finalize();
return (EXIT_SUCCESS);
}
错误:
mpirun noticed that process rank 0 with PID 10336 on node user exited on signal 11 (Segmentation fault).
输出:
[user:10336] *** Process received signal ***
[user:10336] Signal: Segmentation fault (11)
[user:10336] Signal code: Address not mapped (1)
[user:10336] Failing at address: 0x31
[user:10336] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x35860)[0x7f1e93064860]
[user:10336] [ 1] /usr/lib/x86_64-linux-gnu/libboost_serialization.so.1.61.0(+0x14a24)[0x7f1e9409da24]
[user:10336] [ 2] /usr/lib/x86_64-linux-gnu/libboost_serialization.so.1.61.0(+0x15d11)[0x7f1e9409ed11]
[user:10336] [ 3] ./teste(+0x1de7c)[0x55ab4c07ae7c]
[user:10336] [ 4] ./teste(+0x1dd2c)[0x55ab4c07ad2c]
[user:10336] [ 5] ./teste(+0x1db3a)[0x55ab4c07ab3a]
[user:10336] [ 6] ./teste(+0x1d8eb)[0x55ab4c07a8eb]
[user:10336] [ 7] ./teste(+0x1d2da)[0x55ab4c07a2da]
[user:10336] [ 8] ./teste(+0x1cb20)[0x55ab4c079b20]
[user:10336] [ 9] ./teste(+0x1bed0)[0x55ab4c078ed0]
[user:10336] [10] ./teste(+0x1b47c)[0x55ab4c07847c]
[user:10336] [11] ./teste(+0x19741)[0x55ab4c076741]
[user:10336] [12] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1)[0x7f1e9304f3f1]
[user:10336] [13] ./teste(+0x112aa)[0x55ab4c06e2aa]
[user:10336] *** End of error message ***
以下是一些大胆的猜测:
- 您应该只在 rank0 进程上执行初始发送指令 - 现在您在所有没有意义的进程中执行它(并且可能是问题的原因)
- 您不应发送至 "self"。在你循环的第一次迭代中,rank0 向自己发送,afaik 将阻止等待 recv 的进程。但是由于 rank0 被阻止,它永远不会到达 'recv' 行并且将永远保持锁定状态。除此之外,进程向自身发送数据同样没有意义。
这些只是松散的建议,因为我在使用 MPI 方面的经验有限。希望对您有所帮助!