MPI_Reduce 是否需要接收缓冲区的现有指针?

Does MPI_Reduce need an existing pointer for the receive buffer?

MPI documentation asserts that the adress of address of the receive buffer (recvbuf) is significant only at root. Meaning that the memory may not be allocated in the other processes. This is confirmed by this question.

int MPI_Reduce(const void *sendbuf, void *recvbuf, int count, MPI_Datatype datatype,
               MPI_Op op, int root, MPI_Comm comm)

起初我认为 recvbuf 甚至不必存在:recvbuf 本身的内存不必分配(例如通过动态分配)。不幸的是(我花了很多时间才明白我的错误!),似乎即使它指向的内存无效,指针本身也必须存在。

请参阅下面的代码,其中一个版本会出现段错误,而另一个版本不会。

#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>

int main(int argc, char **argv) {
   // MPI initialization
    int world_rank, world_size;
    MPI_Init(NULL, NULL);
    MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
    MPI_Comm_size(MPI_COMM_WORLD, &world_size);

    int n1 = 3, n2 = 10; // Sizes of the 2d arrays

    long **observables = (long **) malloc(n1 * sizeof(long *));
    for (int k = 0 ; k < n1 ; ++k) {
        observables[k] = (long *) calloc(n2, sizeof(long));
        for (long i = 0 ; i < n2 ; ++i) {
            observables[k][i] = k * i * world_rank; // Whatever
        }
    }

    long **obs_sum; // This will hold the sum on process 0
#ifdef OLD  // Version that gives a segfault
    if (world_rank == 0) {
        obs_sum = (long **) malloc(n2 * sizeof(long *));
        for (int k = 0 ; k < n2 ; ++k) {
            obs_sum[k] = (long *) calloc(n2, sizeof(long));
        }
    }
#else // Correct version
   // We define all the pointers in all the processes.
    obs_sum = (long **) malloc(n2 * sizeof(long *));
    if (world_rank == 0) {
        for (int k = 0 ; k < n2 ; ++k) {
            obs_sum[k] = (long *) calloc(n2, sizeof(long));
        }
    }
#endif

    for (int k = 0 ; k < n1 ; ++k) {
        // This is the line that results in a segfault if OLD is defined
        MPI_Reduce(observables[k], obs_sum[k], n2, MPI_LONG, MPI_SUM, 0,
                   MPI_COMM_WORLD);
    }

    MPI_Barrier(MPI_COMM_WORLD);
    MPI_Finalize();
    // You may free memory here

    return 0;
}

我的解释正确吗?这种行为背后的基本原理是什么?

问题不是 MPI,而是您正在通过 obs_sum[k],但您根本没有 defined/allocated。

for (int k = 0 ; k < n1 ; ++k) {
    // This is the line that results in a segfault if OLD is defined
    MPI_Reduce(observables[k], obs_sum[k], n2, MPI_LONG, MPI_SUM, 0,
               MPI_COMM_WORLD);
}

即使 MPI_Reduce() 没有得到它的值,生成的代码也会得到 obs_sum (未定义且未分配),向其添加 k 并尝试读取此指针(段错误)传递给 MPI_Reduce().

例如,行的分配应该足以使其工作:

#else // Correct version
      // We define all the pointers in all the processes.
      obs_sum = (long **) malloc(n2 * sizeof(long *));
      // try commenting out the following lines
      // if (world_rank == 0) {
      //   for (int k = 0 ; k < n2 ; ++k) {
      //     obs_sum[k] = (long *) calloc(n2, sizeof(long));
      //   }
      // }
#endif

我会将二维数组分配为平面数组 - 我真的讨厌这种数组表示法。这样不是更好吗?

#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>

int main(int argc, char **argv) {
   // MPI initialization
    int world_rank, world_size;
    MPI_Init(NULL, NULL);
    MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
    MPI_Comm_size(MPI_COMM_WORLD, &world_size);

    int n1 = 3, n2 = 10; // Sizes of the 2d arrays

    long *observables = (long *) malloc(n1*n2*sizeof(long));
    for (int k = 0 ; k < n1 ; ++k) {
        for (long i = 0 ; i < n2 ; ++i) {
            observables[k*n2+i] = k * i * world_rank; // Whatever
        }
    }

    long *obs_sum = nullptr; // This will hold the sum on process 0
    if (world_rank == 0) {
        obs_sum = (long *) malloc(n1*n2*sizeof(long));
    }

    MPI_Reduce(observables, obs_sum, n1*n2, MPI_LONG, MPI_SUM, 0, MPI_COMM_WORLD);

    MPI_Barrier(MPI_COMM_WORLD);
    MPI_Finalize();
    // You may free memory here

    return 0;
}