MPI_Scatter 会影响 MPI_Bcast 吗?

Does MPI_Scatter influence MPI_Bcast?

我正在通过 MPI_Bcast 发送一个触发终止的整数。根将名为 "running" 的变量设置为零并发送 BCast。 Bcast 似乎已完成,但我看不到该值已发送到其他进程。其他进程似乎正在等待 MPI_Scatter 完成。他们应该连这里都到不了。

我对 MPI_Bcast 做了很多研究,据我了解它应该是阻塞的。这让我感到困惑,因为即使我找不到其他进程的匹配(接收)MPI_Bcast,来自根的 MPI_Bcast 似乎已经完成。我用 printfs 和这些 printfs 的输出 1) 打印和 2) 从根打印正确的值。

根看起来如下:

while (running || ...) {
    /*Do stuff*/
    if (...) {
        running = 0;
        printf("Running = %d and Bcast from root\n", running);
        MPI_Bcast(&running, 1, MPI_INT, 0, MPI_COMM_WORLD);
        printf("Root 0 Bcast complete. Running %d\n", running);
        /* Do some more stuff and eventually reach Finalize */
        printf("Root is Finalizing\n");
        MPI_Finalize();
    }
}

其他进程有如下代码:

while (running) {
    doThisFunction(rank);
    printf("Waiting on BCast from root with myRank: %d\n", rank);
    MPI_Bcast(&running, 1, MPI_INT, 0, MPI_COMM_WORLD);
    printf("P%d received running = %d\n", rank, running);
    if (running == 0) { // just to make sure.
        break;
    }
}
MPI_Finalize();

我在函数"doThisFunction()"中也有如下内容。这是进程似乎正在等待进程 0 的地方:

int doThisFunction(...) {
    /*Do stuff*/
    printf("P%d waiting on Scatter\n", rank);
    MPI_Scatter(buffer, 130, MPI_BYTE, encoded, 130, MPI_BYTE, 0, MPI_COMM_WORLD);
    printf("P%d done with Scatter\n", rank);
    /*Do stuff*/
    printf("P%d waiting on gather\n", rank);
    MPI_Gather(encoded, 1, MPI_INT, buffer, 1, MPI_INT, 0, MPI_COMM_WORLD);
    printf("P%d done with gater\n", rank);
    /*Do Stuff*/
    return aValue;
}

命令行中的输出如下所示:

P0 waiting on Scatter
P0 done with Scatter
P0 waiting on gather
P0 done with gather
Waiting on BCast from root with myRank: 1
P1 received running = 1
P1 waiting on Scatter
P0 waiting on Scatter
P0 done with Scatter
P0 waiting on gather
P0 done with gather
P1 done with Scatter
P1 waiting on gather
P1 done with gather
Waiting on BCast from root with myRank: 1
P1 received running = 1
P1 waiting on Scatter
Running = 0 and Bcast from root
Root 0 Bcast complete. Running 0
/* Why does it say the Bcast is complete 
/* even though P1 didn't output that it received it?
Root is Finalizing
/* Deadlocked...

我希望 P1 收到 运行 为零然后进入 MPI_Finalize() 但它会卡在分散点上,已经在尝试的根无法访问该分散点完成。

实际上,程序处于死锁状态,不会终止 MPI。

我怀疑问题是散点图正在接受 Bcast 值,因为这甚至没有意义,因为根不调用散点图。

有没有人有任何解决此问题的提示?

非常感谢您的帮助。

Why does it say the Bcast is complete even though P1 didn't output that it received it?

请注意 MPI Standard 中的以下定义:

Collective operations can (but are not required to) complete as soon as the caller's participation in the collective communication is finished. ... The completion of a collective operation indicates that the caller is free to modify locations in the communication buffer. It does not indicate that other processes in the group have completed or even started the operation (unless otherwise implied by the description of the operation). Thus, a collective communication operation may, or may not, have the effect of synchronizing all calling processes. This statement excludes, of course, the barrier operation.

根据这个定义,即使没有 MPI_Bcast 被从进程调用,您在根进程上的 MPI_Bcast 也可以完成。

(对于点对点的操作,我们有不同的通信模式,比如同步的,来解决这些问题。不幸的是,集体没有同步模式。)


您的代码中的操作顺序似乎有问题。 root 调用了 MPI_Bcast,但是进程 #1 没有,并且正在等待 MPI_Scatter,如您的日志输出所示。