这可以是多线程中最原子的"cancel if not received" MPI_Irecv

Can this be the most Atomic "cancel if not received" in multithreaded MPI_Irecv

当前问题嵌入在多线程设置中,其中 'several'(例如 5)个线程在每个线程开始使用 MPI_Irecv 作为源 MPI_ANY_SOURCE 开始侦听后工作。在退出函数之前,每个线程都应该检查是否收到消息,否则取消请求以释放内存。

这里假设消息只到达 N 个(例如 5 个)线程之一,这里提到的问题是如果在 (1) 检查消息是否到达和(2) 如果前面的测试返回false,则取消请求,确实应该有消息到达。

作为旁注,使用在原子访问队列上写入的单个接收器应该可以解决它。但这意味着主要代码重构,并且可能会降低性能。

问题是 MPI 标准是否提供了这个问题的答案,它是什么,或者下面的(伪)代码是否确实足够保护。

建议的解决方案似乎很可疑,因为日志(见下文)仅显示组合“irecv 未捕获消息 + 未能取消相关请求”。似乎没有记忆。

main.cpp

//...
MPI_Init_thread(&argc, &argv, MPI_THREAD_MULTIPLE, &provided);
if (provided < MPI_THREAD_MULTIPLE) {
    error_report("[error] The MPI did not provide the requested threading behaviour.");
}
//...

关于相关功能。

// Start recieving 
MPI_Irecv(&buffer, 1, MPI_DOUBLE,
                      MPI_ANY_SOURCE,
                      VERTEXVAL_REQUEST_FLAG,
                      MPI_COMM_WORLD,
                      &R);

// some work goes on here ... 

// Before exiting, we check if a message arrived. 

int flag1=-437, flag2=-437; // any initialization

MPI_Status status1, status2;
status2.MPI_ERROR = -999; // again, any initialization
status1.MPI_ERROR = -999;
MPI_Test(&R, &flag1, &status1);

if (flag1 != 1){
    MPI_Cancel(&R);
    MPI_Test_cancelled(&status2, &flag2);
}
if ((flag1 == 1) || ((flag1!=1) && (flag2!=1))) {

    if (flag1 == 1) {
        build_answer(answer, REF, buffer, status1.MPI_SOURCE, MYPROC);
        printf("A request failed to be cancelled, we are assuming we recieved it! we computed val = %f, recieved buffer = %f ; flags12 = %d %d ;  source = %d ; tag = %d; error = %d\n",
           answer, buffer, flag1, flag2, status1.MPI_SOURCE, status1.MPI_TAG, status1.MPI_ERROR);
        std::cout << std::flush;

        MPI_Ssend(&answer, 1, MPI_DOUBLE, status1.MPI_SOURCE, (int) buffer, MPI_COMM_WORLD);

        printf("Completed!\n");
        std::cout << std::flush;

    } else {
        printf("A request failed to be cancelled: will ignore it. Recieved buffer = %f ; flags12 = %d %d ;  source = %d ; tag = %d ; status error = %d\n",
           buffer, flag1, flag2, status2.MPI_SOURCE, status2.MPI_TAG, status2.MPI_ERROR);
        std::cout << std::flush;
    }
}

这个 'protection' 似乎解决了程序中曾经出现的千分之一的死锁,因为以前的版本只是假设取消失败意味着消息已经到达。特别是,日志条目显示通过 printf.

打印的以下值
A request failed to be cancelled: will ignore it. Recieved buffer = 0.000000 ; flags12 = 0 22020 ;  source = 2 ; tag = 0 ; status error = -183549351
A request failed to be cancelled: will ignore it. Recieved buffer = 0.000000 ; flags12 = 0 0 ;  source = 1 ; tag = 25001 ; status error = 0
A request failed to be cancelled: will ignore it. Recieved buffer = 0.000000 ; flags12 = 0 0 ;  source = 1 ; tag = 25001 ; status error = 0
A request failed to be cancelled: will ignore it. Recieved buffer = -0.000000 ; flags12 = 0 21998 ;  source = 2 ; tag = 0 ; status error = -1563532711
A request failed to be cancelled: will ignore it. Recieved buffer = 16.000000 ; flags12 = 0 0 ;  source = 0 ; tag = 25001 ; status error = 0
A request failed to be cancelled: will ignore it. Recieved buffer = 16.000000 ; flags12 = 0 0 ;  source = 0 ; tag = 25001 ; status error = 0
A request failed to be cancelled: will ignore it. Recieved buffer = 0.000000 ; flags12 = 0 0 ;  source = 0 ; tag = 25001 ; status error = 0
A request failed to be cancelled: will ignore it. Recieved buffer = 0.000000 ; flags12 = 0 22033 ;  source = 2 ; tag = 0 ; status error = -691551655
A request failed to be cancelled: will ignore it. Recieved buffer = 0.000000 ; flags12 = 0 0 ;  source = 0 ; tag = 25001 ; status error = 0
A request failed to be cancelled: will ignore it. Recieved buffer = 16.000000 ; flags12 = 0 0 ;  source = 1 ; tag = 25001 ; status error = 0
A request failed to be cancelled: will ignore it. Recieved buffer = 8.000000 ; flags12 = 0 0 ;  source = 0 ; tag = 25001 ; status error = 0
A request failed to be cancelled: will ignore it. Recieved buffer = 16.000000 ; flags12 = 0 0 ;  source = 1 ; tag = 25001 ; status error = 0
A request failed to be cancelled: will ignore it. Recieved buffer = 0.000000 ; flags12 = 0 0 ;  source = 1 ; tag = 25001 ; status error = 0
A request failed to be cancelled: will ignore it. Recieved buffer = 8.000000 ; flags12 = 0 0 ;  source = 0 ; tag = 25001 ; status error = 0
A request failed to be cancelled: will ignore it. Recieved buffer = 0.000000 ; flags12 = 0 0 ;  source = 1 ; tag = 25001 ; status error = 0
A request failed to be cancelled: will ignore it. Recieved buffer = 0.000000 ; flags12 = 0 0 ;  source = 1 ; tag = 25001 ; status error = 0
A request failed to be cancelled: will ignore it. Recieved buffer = -0.000000 ; flags12 = 0 21998 ;  source = 2 ; tag = 0 ; status error = -1563532711
A request failed to be cancelled: will ignore it. Recieved buffer = 0.000000 ; flags12 = 0 22033 ;  source = 2 ; tag = 0 ; status error = -691551655

查看 MPI_MprobeMPI_Mrecv,它们正是您的多线程方案。没有必要取消接收。详情见https://www.slideshare.net/jsquyres/mpimprobe-is-good-for-you