MPI_Bcast 应该与 MPI_IBcast 一起工作吗?

Is MPI_Bcast supposed to work with MPI_IBcast?

据我所知,you can freely mix blocking and non-blocking MPI operations on both ends of the communication,意思是 MPI_Send(...) 可以被 MPI_Ircv(...) 接收。

就是说,我不能将 MPI_Bcast(...) 与 MPI_Ibcast(...) 一起使用,如下例所示:

#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>

int main(void) {
  MPI_Init(NULL, NULL);  int world_rank;
  MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
  MPI_Request req;
  int i;  
  if (world_rank == 0) {
    i = 126;
    MPI_Ibcast(&i, 1, MPI_INT, 0, MPI_COMM_WORLD, &req);
    // do other stuff

    MPI_Wait(&req, MPI_STATUS_IGNORE);

  } else { 
    MPI_Bcast(&i, 1, MPI_INT, 0, MPI_COMM_WORLD);
  }
  
  MPI_Barrier(MPI_COMM_WORLD);
  return 0;
}

这应该有效吗?我在 MPI 文档中找不到与此信息相关的任何内容。

我在 GCC 10.2.1 中使用 MPICH 3.3.2。

简单回答您的问题:,代码不应该工作。

长答案:我们无法匹配阻塞和非阻塞 集体 调用,因为,

  1. 集体行动没有tag参数。使用标签进行集体操作可以防止某些硬件优化。因此无法像 point-to-point 操作那样进行消息匹配。

  2. 实施可能会在 blocking/nonblocking 情况下使用不同的通信算法进行优化。例如,阻塞集体操作可以优化为最短的完成时间。

这在 MPI 标准 3.1 中有明确定义:

Unlike point-to-point operations, nonblocking collective operations do not match with blocking collective operations, and collective operations do not have a tag argument. All processes must call collective operations (blocking and nonblocking) in the same order per communicator. In particular, once a process calls a collective operation, all other processes in the communicator must eventually call the same collective operation, and no other collective operation with the same communicator in between. This is consistent with the ordering rules for blocking collective operations in threaded environments.

Rationale. Matching blocking and nonblocking collective operations is not allowed because the implementation might use different communication algorithms for the two cases. Blocking collective operations may be optimized for minimal time to completion, while nonblocking collective operations may balance time to completion with CPU overhead and asynchronous progression. The use of tags for collective operations can prevent certain hardware optimizations. (End of rationale.)

希望对您有所帮助!