使用 C 中的 MPI 将所有进程传播的数组的不同部分放入单个最终数组的更好方法

Question

我将此代码仅作为示例，以便您了解我在寻找什么：

  double *f = malloc(sizeof(double) * nx * ny);
  double *f2 = malloc(sizeof(double) * nx * ny);
  for ( i = process * (nx/totalProcesses); i < (process + 1) * (nx/totalProcesses); i++ )
  {
    for ( j = 0; j < ny; j++ )
    {
          f2[i*ny + j] = j*i;
    }
  }
  MPI_Allreduce( f2, f, nx*ny, MPI_DOUBLE, MPI_SUM, MPI_COMM);

是的，它有效，最后我在 'f' 中得到了正确的结果，这就是我想要的，但我想知道是否有更好或更直接的方法来实现同样为了提高效率。我用 allgather 试过，但无法得到正确的结果。

Answer 1

but I would like to know if there is a better or more direct way to achieve the same in order to get efficiency.

不，在给定的上下文中，使用 MPI collective routine（理论上）总是比替代方案 send/recv 更有效。尽管 MPI 标准没有强加它的一个很好的实现，但是，在 log(p) 步中实现了像 MPI_Allreduce 这样的 MPI 集体例程（p 是进程数）。

但是请记住，MPI_Allreduce:

Combines values from all processes and distributes the result back to all processes.

因此，如果您确实需要所有过程中的结果，您可以使用 MPI_Reduce:

Reduces values on all processes to a single value

使用 C 中的 MPI 将所有进程传播的数组的不同部分放入单个最终数组的更好方法

Better way to put different parts of an array spread by all processes into a single final array with MPI in C

c

arrays

parallel-processing

mpi

openmpi