多次调用MPI_Bcast是不是错了?

Is it wrong to call MPI_Bcast several times?

我正在尝试使用 MPI 实现矩阵向量乘法(即 nxn 矩阵乘以 nx1 向量)。

最初,我决定使用多个 MPI_Bcast 调用(在我注意到 MPI_AllGather... 之前),我偶然发现了一些奇怪的行为。显然,无论将什么级别传递给 MPI_Bcast 调用,都可以接收到数据。

使用的部分代码(函数是紧接着调用的,所以发送广播在接收广播之前)。打印仅用于调试目的,我知道测试数据的长度为 2:

class Processor
{
public:
    Processor(int rank, int communicatorSize);

private:
    void broadcastOwnVectorToOtherRanks();
    void receiveBroadcastsFromOtherRanks();
    //...

    int ownRank;
    int communicatorSize;
    std::vector<int> ownVectorPart;
    std::vector<int> totalVector;
    //...
};

void Processor::broadcastOwnVectorToOtherRanks()
{
    //ownVectorPart is correctly filled before this function call
    std::printf("Own data in vector %d %d\n", ownVectorPart[0], ownVectorPart[1]);
    MPI_Bcast(ownVectorPart.data(), ownVectorPart.size(), MPI_INT, ownRank, MPI_COMM_WORLD);
}

void Processor::receiveBroadcastsFromOtherCommunicators()
{
    for (int rank = 0; rank < communicatorSize; ++rank)
    {
        if (rank == ownRank)
        {
            totalVector.insert(totalVector.end(), ownVectorPart.begin(), ownVectorPart.end());
        }
        else
        {
            std::vector<int> buffer(ownVectorPart.size());
            MPI_Bcast(buffer.data(), ownVectorPart.size(), MPI_INT, rank, MPI_COMM_WORLD);
            std::printf("Received from process with rank %d: %d %d\n", rank, buffer[0], buffer[1]);
            totalVector.insert(totalVector.end(), buffer.begin(), buffer.end());
        }
    }
}

结果(按排名排序):

[0] Own data in vector 0 1
[0] Received from communicator 1: 6 7
[0] Received from communicator 2: 4 5
[0] Received from communicator 3: 2 3
[1] Own data in vector 2 3
[1] Received from communicator 0: 0 1
[1] Received from communicator 2: 4 5
[1] Received from communicator 3: 6 7
[2] Own data in vector 4 5
[2] Received from communicator 0: 0 1
[2] Received from communicator 1: 2 3
[2] Received from communicator 3: 6 7
[3] Own data in vector 6 7
[3] Received from communicator 0: 4 5
[3] Received from communicator 1: 2 3
[3] Received from communicator 2: 0 1

如您所见,在 0 级和 3 级的进程中,接收到的数据与发送的数据不同。例如,具有等级 0 的进程从等级 3 接收数据,即使它期望来自进程 1.

的数据

在我看来,接收广播数据时会忽略等级,MPI 会在数据到来时分配数据,无论它是否来自预期等级。

当作为参数传递的等级为 1 时,为什么 MPI_Bcast 从等级为 3 的进程接收数据?同时多次调用 MPI_Bcast 是未定义的行为吗?还是我的代码有错误?

引用 MPI 3.1 标准(第 5.12 节):

All processes must call collective operations (blocking and nonblocking) in the same order per communicator. In particular, once a process calls a collective operation, all other processes in the communicator must eventually call the same collective operation, and no other collective operation with the same communicator in between.

结合第 5.4 节:

If comm is an intracommunicator, MPI_BCAST broadcasts a message from the process with rank root to all processes of the group, itself included. It is called by all members of the group using the same arguments for comm and root.

我将这两个部分解释为您必须在所有进程上以相同的参数以相同的顺序调用 MPI_Bcast 和类似的集体通信功能。使用不同的根值调用无效。

我相信MPI_Allgather更适合您似乎想要的交流方式。它从所有进程收集等量的数据并将其复制到每个进程。