为什么我的 C++ 并行程序在 MPI_Gather 中给出 MPI 致命错误?

Why my C++ parallel program gives MPI fatal error in MPI_Gather?

我的排序程序可以很好地处理数组中偶数个元素,但会出错

" Fatal error in MPI_Gather: Message truncated, error stack: MPI_Gather(sbuf=0x00A2A700, scount=4, MPI_INT, rbuf=0x00A302C8, rcount=4, MPI_INT, root=0, MPI_COMM_WORLD) failed Message from rank 1 and tag -1342177184 truncated; 28 bytes received but buffer size is 16 "

对于数组中元素的奇数个。 问题始于代码 if ((world_rank == 1) && (n % world_size != 0))。我尝试了一切,但没有用。我怎样才能解决这个问题?提前致谢!

void merge(int*, int*, int, int, int);
void mergeSort(int*, int*, int, int);

int main(int argc, char** argv) {

    


    int world_rank;
    int world_size;

    MPI_Init(&argc, &argv);
    MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
    MPI_Comm_size(MPI_COMM_WORLD, &world_size);

    int n = atoi(argv[1]);
    int* original_array{ new int[n] {} };
    //int original_array[]=new int[n];

    int c;
    srand(time(NULL));

    if (world_rank == 0) {
        printf("This is the unsorted array: ");
        for (c = 0; c < n; c++) {

        original_array[c] = rand() % n;
        printf("%d ", original_array[c]);

        }
        printf("\n");
        printf("\n");
    }
    
    
    int size = n / world_size;
    int* sub_array=NULL;
    int* tmp_array = NULL;
    int* sorted = NULL;

    if (world_rank == 0) {

        sorted = { new int[n] {} };

    }


    if ((world_rank == 1) && (n % world_size != 0)) {
        int r = n % world_size;
        int size2 = size + r;
        sub_array = { new int[size2] {} };
        MPI_Scatter(original_array, size2, MPI_INT, sub_array, size2, MPI_INT, 0, MPI_COMM_WORLD);
        tmp_array = { new int[size2] {} };
        mergeSort(sub_array, tmp_array, 0, (size2 - 1));
        MPI_Gather(sub_array, size2, MPI_INT, sorted, size2, MPI_INT, 0, MPI_COMM_WORLD);
    }
    else {
        sub_array = { new int[size] {} };
        MPI_Scatter(original_array, size, MPI_INT, sub_array, size, MPI_INT, 0, MPI_COMM_WORLD);
        tmp_array = { new int[size] {} };
        mergeSort(sub_array, tmp_array, 0, (size - 1));
        MPI_Gather(sub_array, size, MPI_INT, sorted, size, MPI_INT, 0, MPI_COMM_WORLD);
    }

    

    

    
    if (world_rank == 0) {

        printf("Array state before final mergeSort call: ");
        for (c = 0; c < n; c++) {

            printf("%d ", sorted[c]);

        }
        
        printf("\n");

        int* other_array{ new int[n] {} };
        mergeSort(sorted, other_array, 0, (n - 1));

        printf("This is the sorted array: ");
        for (c = 0; c < n; c++) {

            printf("%d ", sorted[c]);

        }

        printf("\n");
        printf("\n");

        delete[] sorted;
        delete[] other_array;

    }

    delete[] original_array;
    delete[] sub_array;
    delete[] tmp_array;

    /********** Finalize MPI **********/
    MPI_Finalize();

}

TL;DR: 对于偶数个元素,进程调用 MPI_ScatterMPI_Gather 相同的 count,具有他们没有的奇数。

My sorting program works fine with even number of elements in array

数组大小为偶数时所有进程执行else部分:

   if ((world_rank == 1) && (n % world_size != 0)) {
        int r = n % world_size;
        int size2 = size + r;
        sub_array = { new int[size2] {} };
        MPI_Scatter(original_array, size2, MPI_INT, sub_array, size2, MPI_INT, 0, MPI_COMM_WORLD);
        tmp_array = { new int[size2] {} };
        mergeSort(sub_array, tmp_array, 0, (size2 - 1));
        MPI_Gather(sub_array, size2, MPI_INT, sorted, size2, MPI_INT, 0, MPI_COMM_WORLD);
    }
    else {
        sub_array = { new int[size] {} };
        MPI_Scatter(original_array, size, MPI_INT, sub_array, size, MPI_INT, 0, MPI_COMM_WORLD);
        tmp_array = { new int[size] {} };
        mergeSort(sub_array, tmp_array, 0, (size - 1));
        MPI_Gather(sub_array, size, MPI_INT, sorted, size, MPI_INT, 0, MPI_COMM_WORLD);
    }

but gives error " Fatal error in MPI_Gather: Message truncated, error stack: MPI_Gather(sbuf=0x00A2A700, scount=4, MPI_INT, rbuf=0x00A302C8, rcount=4, MPI_INT, root=0, MPI_COMM_WORLD) failed Message from rank 1 and tag -1342177184 truncated; 28 bytes received but buffer size is 16 " for odd number of elements in array.

然而,当数组的大小为奇数时进程1执行上述if and elseif部分,而其他进程执行else 部分。因此,某些进程将调用具有不同 count 的例程 MPI_GatherMPI_Scatter。所有进程都应该以相同的方式调用这些例程

要修复您的代码,您可以更改它,以便所有进程调用相同的 count MPI_ScatterMPI_Gather 例程。

一般情况下,当输入的大小未除以进程数时,您将遇到与您遇到的相同的问题。要解决这个问题,可以将 dummy 值添加到数组中,以便大小除以进程数。或者可以使用 MPI_Gatherv:

Gathers into specified locations from all processes in a group

MPI_Scatterv

Scatters a buffer in parts to all processes in a communicator

source可以读到:

MPI_Gatherv and MPI_Scatterv are the variable-message-size versions of MPI_Gather and MPI_Scatter. MPI_Gatherv extends the functionality of MPI_Gather to permit a varying count of data from each process, and to allow some flexibility in where the gathered data is placed on the root process. It does this by changing the count argument from a single integer to an integer array and providing a new argument displs (an array) . MPI_Scatterv extends MPI_Scatter in a similar manner. More information on the use of these routines will be presented in an Application Example later in this module.