在 MPI 中拆分和传递数组块

Question

我是 MPI 的新手，我试图通过编写一个简单的 C 程序来理解它的含义。我想做的就是拆分一个数组并将块发送到 N 个处理器。因此，每个处理器都会在他们的块中找到本地最小值。然后程序（在根目录或其他地方）找到全局最小值。

我研究了 MPI_Send、MPI_Isend 或 MPI_Bcast 函数，但对在哪里使用一个函数而不是另一个函数有点困惑。我需要一些有关我的程序的一般结构的提示：

#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>

#define N 9 // array size

int A[N] = {0,2,1,5,4,3,7,6,8}; // this is a dummy array

int main(int argc, char *argv[]) {

    int i, k = 0, size, rank, source = 0, dest = 1, count;
    int tag = 1234;

    MPI_Init(&argc, &argv);

    MPI_Comm_size(MPI_COMM_WORLD, &size);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);

    count = N/(size-1); // think size = 4 for this example

    int *tempArray = malloc(count * sizeof(int)); 
    int *localMins = malloc((size-1) * sizeof(int)); 

    if (rank == 0) {

        for(i=0; i<size; i+=count) 
        {
            // Is it better to use MPI_Isend or MPI_Bcast here?
            MPI_Send(&A[i], count, MPI_INT, dest, tag, MPI_COMM_WORLD);
            printf("P0 sent a %d elements to P%d.\n", count, dest);
            dest++;
        }
    }
    else {

        for(i=0; i<size; i+=count) 
        {       
            MPI_Recv(tempArray, count, MPI_INT, 0, tag, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
            localMins[k] = findMin(tempArray, count);
            printf("Min for P%d is %d.\n", rank, localMins[k]);
            k++;            
        }
    }

    MPI_Finalize();

    int gMin = findMin(localMins, (size-1)); // where should I assign this
    printf("Global min: %d\n", gMin); // and where should I print the results?

    return 0;
}

我的代码中可能存在多个错误，很抱歉无法在此处指定确切的问题。感谢您的任何建议。

Answer 1

您的代码存在几个问题（正如您已经指出的那样），并且正如一些评论者已经提到的那样，有其他方法可以使用 MPI 调用来执行您尝试执行的操作。

但是，我将重新调整您的代码的用途，尽量不要更改太多，以便向您展示发生了什么。

#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>

#define N 9 // array size
int A[N] = {0,2,1,5,4,3,7,6,8}; // this is a dummy array that should only be initialized on rank == ROOT

int main(int argc, char *argv[]) {

    int size;
    int rank;
    const int VERY_LARGE_INT = 999999;
    const int ROOT = 0; // the master rank that holds A to begin with
    int tag = 1234;

    MPI_Init(&argc, &argv);

    MPI_Comm_size(MPI_COMM_WORLD, &size); // think size = 4 for this example
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);

    /* 
       How many numbers you send from ROOT to each other rank. 
       Note that for this implementation to work, (size-1) must divide N.
    */
    int count = N/(size-1);

    int *localArray = (int *)malloc(count * sizeof(int));
    int localMin;  // minimum computed on rank i
    int globalMin; // will only be valid on rank == ROOT

    /* rank == ROOT sends portion of A to every other rank */
    if (rank == ROOT) {

        for(int dest = 1; dest < size; ++dest) 
        {
            // If you are sending information from one rank to another, you use MPI_Send or MPI_Isend.
            // If you are sending information from one rank to ALL others, then every rank must call MPI_Bcast (similar to MPI_Reduce below)
            MPI_Send(&A[(dest-1)*count], count, MPI_INT, dest, tag, MPI_COMM_WORLD);
            printf("P0 sent a %d elements to P%d.\n", count, dest);
        }
        localMin = VERY_LARGE_INT; // needed for MPI_Reduce below
    }

    /* Every other rank is receiving one message: from ROOT into local array */
    else {
        MPI_Recv(localArray, count, MPI_INT, ROOT, tag, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
        localMin = findMin(localArray, count);
        printf("Min for P%d is %d.\n", rank, localMin);
    }

    /* 
       At this point, every rank in communicator has valid information stored in localMin. 
       Use MPI_Reduce in order to find the global min among all ranks.
       Store this single globalMin on rank == ROOT.
    */
    MPI_Reduce(&localMin, &globalMin, 1, MPI_INT, MPI_MIN, ROOT, MPI_COMM_WORLD);

    if (rank == ROOT)
        printf("Global min: %d\n", globalMin);

    /* The last thing you do is Finalize MPI. Nothing should come after. */
    MPI_Finalize();
    return 0;
}

完全披露：我没有测试过这段代码，但除了轻微的拼写错误外，它应该可以工作。

查看此代码，看看您是否能理解为什么我移动了您的 MPI_Send 和 MPI_Recv 调用。要理解这一点，请注意每个级别都在阅读您提供的每一行代码。因此，在您的 else 语句中，不应有 for 接收循环。

此外，MPI 集合（例如 MPI_Reduce 和 MPI_Bcast）必须由通信器中的每个级别调用。这些调用的 "source" 和 "destination" 等级是函数输入参数的一部分或由集合本身隐含。

最后，给你做一点作业：你能看出为什么这不是查找数组全局最小值的好实现吗 A？提示：rank == ROOT 在完成其 MPI_Send 之后正在做什么？您如何更好地拆分这个问题，以便每个等级的工作表现更均匀？

在 MPI 中拆分和传递数组块

Splitting and Passing Array Blocks in MPI

c

c++

arrays

parallel-processing

mpi