为什么 MPI_Send 在我尝试发送 2D int 数组时阻塞？

Question

我正在尝试使用 mpi 执行分形图片并行计算。我将我的程序分为 4 部分：

平衡每个等级处理的行数
对排名的每一行属性执行计算
发送行数和行数到rank 0
处理排名 0 的数据（为了测试只打印 int）

第 1 步和第 2 步正在运行，但是当我尝试将行发送到排名 0 时，程序停止并阻塞。我知道 MPI_Send 可以阻止，但这里没有理由这样做。

这是第 2 步：

第 1 步：

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

/* Include the MPI library for function calls */
#include <mpi.h>

/* Define tags for each MPI_Send()/MPI_Recv() pair so distinct messages can be
 * sent */
#define OTHER_N_ROWS_TAG 0
#define OTHER_PIXELS_TAG 1

int main(int argc, char **argv) {
  const int nRows = 513;
  const int nCols = 513;
  const int middleRow = 0.5 * (nRows - 1);
  const int middleCol = 0.5 * (nCols - 1);
  const double step = 0.00625;
  const int depth = 100;
  int pixels[nRows][nCols];
  int row;
  int col;
  double xCoord;
  double yCoord;
  int i;
  double x;
  double y;
  double tmp;
  int myRank;
  int nRanks;
  int evenSplit;
  int nRanksWith1Extra;
  int myRow0;
  int myNRows;
  int rank;
  int otherNRows;
  int otherPixels[nRows][nCols];

  /* Each rank sets up MPI */
  MPI_Init(&argc, &argv);

  /* Each rank determines its ID and the total number of ranks */
  MPI_Comm_rank(MPI_COMM_WORLD, &myRank);
  MPI_Comm_size(MPI_COMM_WORLD, &nRanks);
  printf("My rank is %d \n",myRank);
  evenSplit = nRows / nRanks;
  nRanksWith1Extra = nRows % nRanks;

/*Each rank determine the number of rows that he will have to perform (well balanced)*/
  if (myRank < nRanksWith1Extra) {

    myNRows = evenSplit + 1;
    myRow0 = myRank * (evenSplit + 1);
  }
  else {
    myNRows = evenSplit;
    myRow0 = (nRanksWith1Extra * (evenSplit + 1)) +
      ((myRank - nRanksWith1Extra) * evenSplit);
  }
/*__________________________________________________________________________________*/

第 2 步：

/*_____________________PERFORM CALCUL ON EACH PIXEL________________________________ */
  for (row = myRow0; row < myRow0 + myNRows; row++) {

    /* Each rank loops over the columns in the given row */
    for (col = 0; col < nCols; col++) {

      /* Each rank sets the (x,y) coordinate for the pixel in the given row and 
       * column */
      xCoord = (col - middleCol) * step;
      yCoord = (row - middleRow) * step;

      /* Each rank calculates the number of iterations for the pixel in the
       * given row and column */
      i = 0;
      x = 0;
      y = 0;
      while ((x*x + y*y < 4) && (i < depth)) {
        tmp = x*x - y*y + xCoord;
        y = 2*x*y + yCoord;
        x = tmp;
        i++;
      }

      /* Each rank stores the number of iterations for the pixel in the given
       * row and column. The initial row is subtracted from the current row
       * so the array starts at 0 */
      pixels[row - myRow0][col] = i;
    }
      //printf("one row performed by %d \n",myRank);

  }
      printf("work done by %d \n",myRank);
/*_________________________________________________________________________________*/

第 3 步：

/*__________________________SEND DATA TO RANK 0____________________________________*/

  /* Each rank (including Rank 0) sends its number of rows to Rank 0 so Rank 0
   * can tell how many pixels to receive */
  MPI_Send(&myNRows, 1, MPI_INT, 0, OTHER_N_ROWS_TAG, MPI_COMM_WORLD);
  printf("test \n");
  /* Each rank (including Rank 0) sends its pixels array to Rank 0 so Rank 0
   * can print it */
  MPI_Send(&pixels, sizeof(int)*myNRows * nCols, MPI_BYTE, 0, OTHER_PIXELS_TAG,
      MPI_COMM_WORLD);
  printf("enter ranking 0 \n");
/*_________________________________________________________________________________*/

第 4 步：

/*________________________TREAT EACH ROW IN RANK 0_________________________________*/
  /* Only Rank 0 prints so the output is in order */
  if (myRank == 0) {

    /* Rank 0 loops over each rank so it can receive that rank's messages */
    for (rank = 0; rank < nRanks; rank++){

      /* Rank 0 receives the number of rows from the given rank so it knows how
       * many pixels to receive in the next message */
      MPI_Recv(&otherNRows, 1, MPI_INT, rank, OTHER_N_ROWS_TAG,
      MPI_COMM_WORLD, MPI_STATUS_IGNORE);

      /* Rank 0 receives the pixels array from each of the other ranks
       * (including itself) so it can print the number of iterations for each
       * pixel */
      MPI_Recv(&otherPixels, otherNRows * nCols, MPI_INT, rank,
          OTHER_PIXELS_TAG, MPI_COMM_WORLD, MPI_STATUS_IGNORE);

      /* Rank 0 loops over the rows for the given rank */
      for (row = 0; row < otherNRows; row++) {

        /* Rank 0 loops over the columns within the given row */
        for (col = 0; col < nCols; col++) {

          /* Rank 0 prints the value of the pixel at the given row and column
           * followed by a comma */
          printf("%d,", otherPixels[row][col]);
        }

        /* In between rows, Rank 0 prints a newline character */
        printf("\n");
      }
    }
  }

  /* All processes clean up the MPI environment */
  MPI_Finalize();

  return 0;
}

我想了解为什么会阻塞，您能解释一下吗？我是 MPI 的新用户，我想学习它不仅仅是为了拥有一个正在运行的程序。

提前致谢。

Answer 1

当您使用阻塞 send/recv 构造发送到等级 0 本身时，它可能会导致死锁。

来自MPI 3.0 standard, Section 3.2.4：

Source = destination is allowed, that is, a process can send a message to itself. (However, it is unsafe to do so with the blocking send and receive operations described above, since this may lead to deadlock. See Section 3.5.)

可能的解决方案：

当 sending/receiving to/from 排名 0 本身时，使用非阻塞 send/recv 构造。有关详细信息，请查看 MPI_Isend, MPI_Irecv and MPI_Wait 例程。
消除与等级0本身的通信。因为你在等级 0，你已经有办法知道你有多少像素需要计算。

Answer 2

MPI_Send 是 根据定义 的标准阻塞操作。

注意屏蔽意味着：

it does not return until the message data and envelope have been safely stored away so that the sender is free to modify the send buffer. The message might be copied directly into the matching receive buffer, or it might be copied into a temporary system buffer.

尝试让等级使用 MPI_Send 和 MPI_Recv 向自己发送消息是一个僵局。

适合您情况的惯用模式是使用适当的集体通信操作 MPI_Gather 和 MPI_Gatherv。

Answer 3

如前一个答案所述，MPI_Send() 可能阻塞。

从理论上 MPI 的角度来看，您的应用程序是不正确的，因为存在潜在的死锁（当没有收到 posted 时，对自身的排名 0 MPI_Send() ).

从非常务实的角度来看，MPI_Send()一般在发送small消息时立即returns（例如myNRows )，但在发送 large 消息（例如 pixels）时阻塞，直到匹配的接收被 posted。请牢记

small 和 large 至少取决于正在使用的 MPI 库和互连
从 MPI 的角度来看，假设 MPI_Send() 将 return 立即发送 small 条消息是不正确的

如果您真的想确保您的应用程序没有死锁，只需将 MPI_Send() 替换为 MPI_Ssend()。

回到你的问题，这里有几个选项

改造您的应用程序，使排名 0 不与自身通信（所有信息都可用，因此不需要通信
post 在 MPI_Send() 之前添加 MPI_Irecv()，并将 MPI_Recv(source=0) 替换为 MPI_Wait()
改进您的应用，使排名 0 不是 MPI_Send() 也不是 MPI_Recv(source=0)，而是 MPI_Sendrecv。这是我推荐的选项，因为你只需要对通信模式做一个小的改变（计算模式保持不变），这更优雅恕我直言。

为什么 MPI_Send 在我尝试发送 2D int 数组时阻塞？

Why does MPI_Send is blocking when i try to send 2D int array?

mpi

openmpi