这可以是多线程中最原子的"cancel if not received" MPI_Irecv
Can this be the most Atomic "cancel if not received" in multithreaded MPI_Irecv
当前问题嵌入在多线程设置中,其中 'several'(例如 5)个线程在每个线程开始使用 MPI_Irecv
作为源 MPI_ANY_SOURCE
开始侦听后工作。在退出函数之前,每个线程都应该检查是否收到消息,否则取消请求以释放内存。
这里假设消息只到达 N 个(例如 5 个)线程之一,这里提到的问题是如果在 (1) 检查消息是否到达和(2) 如果前面的测试返回false,则取消请求,确实应该有消息到达。
作为旁注,使用在原子访问队列上写入的单个接收器应该可以解决它。但这意味着主要代码重构,并且可能会降低性能。
问题是 MPI 标准是否提供了这个问题的答案,它是什么,或者下面的(伪)代码是否确实足够保护。
建议的解决方案似乎很可疑,因为日志(见下文)仅显示组合“irecv 未捕获消息 + 未能取消相关请求”。似乎没有记忆。
在main.cpp
//...
MPI_Init_thread(&argc, &argv, MPI_THREAD_MULTIPLE, &provided);
if (provided < MPI_THREAD_MULTIPLE) {
error_report("[error] The MPI did not provide the requested threading behaviour.");
}
//...
关于相关功能。
// Start recieving
MPI_Irecv(&buffer, 1, MPI_DOUBLE,
MPI_ANY_SOURCE,
VERTEXVAL_REQUEST_FLAG,
MPI_COMM_WORLD,
&R);
// some work goes on here ...
// Before exiting, we check if a message arrived.
int flag1=-437, flag2=-437; // any initialization
MPI_Status status1, status2;
status2.MPI_ERROR = -999; // again, any initialization
status1.MPI_ERROR = -999;
MPI_Test(&R, &flag1, &status1);
if (flag1 != 1){
MPI_Cancel(&R);
MPI_Test_cancelled(&status2, &flag2);
}
if ((flag1 == 1) || ((flag1!=1) && (flag2!=1))) {
if (flag1 == 1) {
build_answer(answer, REF, buffer, status1.MPI_SOURCE, MYPROC);
printf("A request failed to be cancelled, we are assuming we recieved it! we computed val = %f, recieved buffer = %f ; flags12 = %d %d ; source = %d ; tag = %d; error = %d\n",
answer, buffer, flag1, flag2, status1.MPI_SOURCE, status1.MPI_TAG, status1.MPI_ERROR);
std::cout << std::flush;
MPI_Ssend(&answer, 1, MPI_DOUBLE, status1.MPI_SOURCE, (int) buffer, MPI_COMM_WORLD);
printf("Completed!\n");
std::cout << std::flush;
} else {
printf("A request failed to be cancelled: will ignore it. Recieved buffer = %f ; flags12 = %d %d ; source = %d ; tag = %d ; status error = %d\n",
buffer, flag1, flag2, status2.MPI_SOURCE, status2.MPI_TAG, status2.MPI_ERROR);
std::cout << std::flush;
}
}
这个 'protection' 似乎解决了程序中曾经出现的千分之一的死锁,因为以前的版本只是假设取消失败意味着消息已经到达。特别是,日志条目显示通过 printf
.
打印的以下值
A request failed to be cancelled: will ignore it. Recieved buffer = 0.000000 ; flags12 = 0 22020 ; source = 2 ; tag = 0 ; status error = -183549351
A request failed to be cancelled: will ignore it. Recieved buffer = 0.000000 ; flags12 = 0 0 ; source = 1 ; tag = 25001 ; status error = 0
A request failed to be cancelled: will ignore it. Recieved buffer = 0.000000 ; flags12 = 0 0 ; source = 1 ; tag = 25001 ; status error = 0
A request failed to be cancelled: will ignore it. Recieved buffer = -0.000000 ; flags12 = 0 21998 ; source = 2 ; tag = 0 ; status error = -1563532711
A request failed to be cancelled: will ignore it. Recieved buffer = 16.000000 ; flags12 = 0 0 ; source = 0 ; tag = 25001 ; status error = 0
A request failed to be cancelled: will ignore it. Recieved buffer = 16.000000 ; flags12 = 0 0 ; source = 0 ; tag = 25001 ; status error = 0
A request failed to be cancelled: will ignore it. Recieved buffer = 0.000000 ; flags12 = 0 0 ; source = 0 ; tag = 25001 ; status error = 0
A request failed to be cancelled: will ignore it. Recieved buffer = 0.000000 ; flags12 = 0 22033 ; source = 2 ; tag = 0 ; status error = -691551655
A request failed to be cancelled: will ignore it. Recieved buffer = 0.000000 ; flags12 = 0 0 ; source = 0 ; tag = 25001 ; status error = 0
A request failed to be cancelled: will ignore it. Recieved buffer = 16.000000 ; flags12 = 0 0 ; source = 1 ; tag = 25001 ; status error = 0
A request failed to be cancelled: will ignore it. Recieved buffer = 8.000000 ; flags12 = 0 0 ; source = 0 ; tag = 25001 ; status error = 0
A request failed to be cancelled: will ignore it. Recieved buffer = 16.000000 ; flags12 = 0 0 ; source = 1 ; tag = 25001 ; status error = 0
A request failed to be cancelled: will ignore it. Recieved buffer = 0.000000 ; flags12 = 0 0 ; source = 1 ; tag = 25001 ; status error = 0
A request failed to be cancelled: will ignore it. Recieved buffer = 8.000000 ; flags12 = 0 0 ; source = 0 ; tag = 25001 ; status error = 0
A request failed to be cancelled: will ignore it. Recieved buffer = 0.000000 ; flags12 = 0 0 ; source = 1 ; tag = 25001 ; status error = 0
A request failed to be cancelled: will ignore it. Recieved buffer = 0.000000 ; flags12 = 0 0 ; source = 1 ; tag = 25001 ; status error = 0
A request failed to be cancelled: will ignore it. Recieved buffer = -0.000000 ; flags12 = 0 21998 ; source = 2 ; tag = 0 ; status error = -1563532711
A request failed to be cancelled: will ignore it. Recieved buffer = 0.000000 ; flags12 = 0 22033 ; source = 2 ; tag = 0 ; status error = -691551655
查看 MPI_Mprobe
和 MPI_Mrecv
,它们正是您的多线程方案。没有必要取消接收。详情见https://www.slideshare.net/jsquyres/mpimprobe-is-good-for-you
当前问题嵌入在多线程设置中,其中 'several'(例如 5)个线程在每个线程开始使用 MPI_Irecv
作为源 MPI_ANY_SOURCE
开始侦听后工作。在退出函数之前,每个线程都应该检查是否收到消息,否则取消请求以释放内存。
这里假设消息只到达 N 个(例如 5 个)线程之一,这里提到的问题是如果在 (1) 检查消息是否到达和(2) 如果前面的测试返回false,则取消请求,确实应该有消息到达。
作为旁注,使用在原子访问队列上写入的单个接收器应该可以解决它。但这意味着主要代码重构,并且可能会降低性能。
问题是 MPI 标准是否提供了这个问题的答案,它是什么,或者下面的(伪)代码是否确实足够保护。
建议的解决方案似乎很可疑,因为日志(见下文)仅显示组合“irecv 未捕获消息 + 未能取消相关请求”。似乎没有记忆。
在main.cpp
//...
MPI_Init_thread(&argc, &argv, MPI_THREAD_MULTIPLE, &provided);
if (provided < MPI_THREAD_MULTIPLE) {
error_report("[error] The MPI did not provide the requested threading behaviour.");
}
//...
关于相关功能。
// Start recieving
MPI_Irecv(&buffer, 1, MPI_DOUBLE,
MPI_ANY_SOURCE,
VERTEXVAL_REQUEST_FLAG,
MPI_COMM_WORLD,
&R);
// some work goes on here ...
// Before exiting, we check if a message arrived.
int flag1=-437, flag2=-437; // any initialization
MPI_Status status1, status2;
status2.MPI_ERROR = -999; // again, any initialization
status1.MPI_ERROR = -999;
MPI_Test(&R, &flag1, &status1);
if (flag1 != 1){
MPI_Cancel(&R);
MPI_Test_cancelled(&status2, &flag2);
}
if ((flag1 == 1) || ((flag1!=1) && (flag2!=1))) {
if (flag1 == 1) {
build_answer(answer, REF, buffer, status1.MPI_SOURCE, MYPROC);
printf("A request failed to be cancelled, we are assuming we recieved it! we computed val = %f, recieved buffer = %f ; flags12 = %d %d ; source = %d ; tag = %d; error = %d\n",
answer, buffer, flag1, flag2, status1.MPI_SOURCE, status1.MPI_TAG, status1.MPI_ERROR);
std::cout << std::flush;
MPI_Ssend(&answer, 1, MPI_DOUBLE, status1.MPI_SOURCE, (int) buffer, MPI_COMM_WORLD);
printf("Completed!\n");
std::cout << std::flush;
} else {
printf("A request failed to be cancelled: will ignore it. Recieved buffer = %f ; flags12 = %d %d ; source = %d ; tag = %d ; status error = %d\n",
buffer, flag1, flag2, status2.MPI_SOURCE, status2.MPI_TAG, status2.MPI_ERROR);
std::cout << std::flush;
}
}
这个 'protection' 似乎解决了程序中曾经出现的千分之一的死锁,因为以前的版本只是假设取消失败意味着消息已经到达。特别是,日志条目显示通过 printf
.
A request failed to be cancelled: will ignore it. Recieved buffer = 0.000000 ; flags12 = 0 22020 ; source = 2 ; tag = 0 ; status error = -183549351 A request failed to be cancelled: will ignore it. Recieved buffer = 0.000000 ; flags12 = 0 0 ; source = 1 ; tag = 25001 ; status error = 0 A request failed to be cancelled: will ignore it. Recieved buffer = 0.000000 ; flags12 = 0 0 ; source = 1 ; tag = 25001 ; status error = 0 A request failed to be cancelled: will ignore it. Recieved buffer = -0.000000 ; flags12 = 0 21998 ; source = 2 ; tag = 0 ; status error = -1563532711 A request failed to be cancelled: will ignore it. Recieved buffer = 16.000000 ; flags12 = 0 0 ; source = 0 ; tag = 25001 ; status error = 0 A request failed to be cancelled: will ignore it. Recieved buffer = 16.000000 ; flags12 = 0 0 ; source = 0 ; tag = 25001 ; status error = 0 A request failed to be cancelled: will ignore it. Recieved buffer = 0.000000 ; flags12 = 0 0 ; source = 0 ; tag = 25001 ; status error = 0 A request failed to be cancelled: will ignore it. Recieved buffer = 0.000000 ; flags12 = 0 22033 ; source = 2 ; tag = 0 ; status error = -691551655 A request failed to be cancelled: will ignore it. Recieved buffer = 0.000000 ; flags12 = 0 0 ; source = 0 ; tag = 25001 ; status error = 0 A request failed to be cancelled: will ignore it. Recieved buffer = 16.000000 ; flags12 = 0 0 ; source = 1 ; tag = 25001 ; status error = 0 A request failed to be cancelled: will ignore it. Recieved buffer = 8.000000 ; flags12 = 0 0 ; source = 0 ; tag = 25001 ; status error = 0 A request failed to be cancelled: will ignore it. Recieved buffer = 16.000000 ; flags12 = 0 0 ; source = 1 ; tag = 25001 ; status error = 0 A request failed to be cancelled: will ignore it. Recieved buffer = 0.000000 ; flags12 = 0 0 ; source = 1 ; tag = 25001 ; status error = 0 A request failed to be cancelled: will ignore it. Recieved buffer = 8.000000 ; flags12 = 0 0 ; source = 0 ; tag = 25001 ; status error = 0 A request failed to be cancelled: will ignore it. Recieved buffer = 0.000000 ; flags12 = 0 0 ; source = 1 ; tag = 25001 ; status error = 0 A request failed to be cancelled: will ignore it. Recieved buffer = 0.000000 ; flags12 = 0 0 ; source = 1 ; tag = 25001 ; status error = 0 A request failed to be cancelled: will ignore it. Recieved buffer = -0.000000 ; flags12 = 0 21998 ; source = 2 ; tag = 0 ; status error = -1563532711 A request failed to be cancelled: will ignore it. Recieved buffer = 0.000000 ; flags12 = 0 22033 ; source = 2 ; tag = 0 ; status error = -691551655
查看 MPI_Mprobe
和 MPI_Mrecv
,它们正是您的多线程方案。没有必要取消接收。详情见https://www.slideshare.net/jsquyres/mpimprobe-is-good-for-you