如何让 MPI 中的所有等级向等级 0 发送一个值,然后阻塞接收所有等级?
How to get all ranks in MPI to do a send a value to rank 0 which then does a blocking receive on all of them?
假设我有 n 个进程:
他们进行计算,然后将结果发送到排名 0。这就是我想要发生的事情:
等级 0 将等待,直到它从所有等级中获得结果,然后将它们相加。
我该怎么做?另外,我想避免以下情况:
例如。 4 个进程 P0、P1、P2、P3,
P1 -> P0
P2 -> P0
P3 -> P0
同时P1计算完成,P1->P0再次发生
我希望 P0 在一个循环中只对 3 个进程进行加法,然后再为下一个循环进行加法。
有人可以推荐一个 MPI 函数来做这个吗?我知道 MPI_Gather 但我不确定它是否阻塞
我想到了这个:
#include <mpi.h>
#include <stdio.h>
int main()
{
int pross, rank,p_count = 0;
int count = 10;
MPI_Init(&argc,&argv);
MPI_Comm_size(MPI_COMM_WORLD,&pross);
MPI_Comm_rank(MPI_COMM_WORLD,&rank);
int * num = malloc((pross-1)*sizeof(int));
if(rank !=0)
{
MPI_Send(&count,1,MPI_INT,0,1,MPI_COMM_WORLD);
}
else
{
MPI_Gather(&count, 1,MPI_INT,num, 1, MPI_INT, 0,MPI_COMM_WORLD);
for(ii = 0; ii < pross-1;ii++ ){printf("\n NUM %d \n",num[ii]); p_count += num[ii]; }
}
MPI_Finalize();
}
我遇到错误:
*** Process received signal ***
Signal: Segmentation fault (11)
Signal code: Address not mapped (1)
Failing at address: (nil)
[ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x11630)[0x7fb3e3bc3630]
[ 1] /lib/x86_64-linux-gnu/libc.so.6(+0x90925)[0x7fb3e387b925]
[ 2] /usr/lib/libopen-pal.so.13(+0x30177)[0x7fb3e3302177]
[ 3] /usr/lib/libmpi.so.12(ompi_datatype_sndrcv+0x54c)[0x7fb3e3e1e3ec]
[ 4] /usr/lib/openmpi/lib/openmpi/mca_coll_tuned.so(ompi_coll_tuned_gather_intra_basic_linear+0x143)[0x7fb3d53d9063]
[ 5] /usr/lib/libmpi.so.12(PMPI_Gather+0x1ba)[0x7fb3e3e29a3a]
[ 6] sosuks(+0xe83)[0x55ee72119e83]
[ 7] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1)[0x7fb3e380b3f1]
[ 8] sosuks(+0xb5a)[0x55ee72119b5a]
*** End of error message ***
此外,我试过:
#include <mpi.h>
#include <stdio.h>
int main()
{
int pross, rank,p_count = 0;
int count = 10;
MPI_Init(&argc,&argv);
MPI_Comm_size(MPI_COMM_WORLD,&pross);
MPI_Comm_rank(MPI_COMM_WORLD,&rank);
int * num = malloc((pross-1)*sizeof(int));
if(rank !=0)
{
MPI_Send(&count,1,MPI_INT,0,1,MPI_COMM_WORLD);
}
else
{
MPI_Gather(&count, 1,MPI_INT,num, 1, MPI_INT, 0,MPI_COMM_WORLD);
for(ii = 0; ii < pross-1;ii++ ){printf("\n NUM %d \n",num[ii]); p_count += num[ii]; }
}
MPI_Finalize();
}
我在这里遇到错误:
*** Process received signal ***
Signal: Segmentation fault (11)
Signal code: Address not mapped (1)
Failing at address: 0x560600000002
[ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x11630)[0x7fefc8c11630]
[ 1] mdscisuks(+0xeac)[0x5606c1263eac]
[ 2] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1)[0x7fefc88593f1]
[ 3] mdscisuks(+0xb4a)[0x5606c1263b4a]
*** End of error message ***
第二次尝试,这里要注意的是发送和接收成功,但是root因为某些原因只能收到2条来自ranks的消息。看到的分段错误是由于 num 中只有两个元素,我不明白为什么 num 只接收两次 .
我将代码称为
mpiexec -n 6 ./sosuks
有人可以告诉我更好/正确的方法来实现我的想法吗?
更新:
除了下面的答案,我发现我在上面的实现中有错误,我想分享一下:
#include <mpi.h>
#include <stdio.h>
int main()
{
int pross, rank,p_count = 0;
int count = 10;
MPI_Init(&argc,&argv);
MPI_Comm_size(MPI_COMM_WORLD,&pross);
MPI_Comm_rank(MPI_COMM_WORLD,&rank);
MPI_Status status;
int * num = malloc((pross-1)*sizeof(int));
if(rank !=0)
{
MPI_Send(&count,1,MPI_INT,0,1,MPI_COMM_WORLD);
}
else
{
int var,lick = 0;
for(lick = 1; lick < pross; u++)
{
int fetihs;
MPI_Recv(&fetihs,1,MPI_INT,lick,1,MPI_COMM_WORLD,&status);
var += fetihs;
}
// do things with var
}
MPI_Finalize();
}
对于您的情况,正如 Sneftel 指出的那样,您需要 MPI_Reduce
。此外,在循环完成之前不需要显式同步。
#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char* argv[])
{
int pross, rank, p_count, count = 10;
MPI_Init(&argc,&argv);
MPI_Comm_size(MPI_COMM_WORLD, &pross);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
int* num = malloc((pross-1)*sizeof(int));
// master does not send data to itself.
// only workers send data to master.
for (int i=0; i<3; ++i)
{
// to prove that no further sync is needed.
// you will get the same answer in each cycle.
p_count = 0;
if (rank == 0)
{
// this has not effect since master uses p_count for both
// send and receive buffers due to MPI_IN_PLACE.
count = 500;
MPI_Reduce(MPI_IN_PLACE, &p_count, 1, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD);
}
else
{
// for slave p_count is irrelevant.
MPI_Reduce(&count, NULL, 1, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD);
}
if (rank == 0)
{
printf("p_count = %i\n", p_count);
}
// slaves send their data to master before the cycle completes.
// no need for explicit sync such as MPI_Barrier.
// MPI_Barrier(MPI_COMM_WORLD); // no need.
}
MPI_Finalize();
}
在上面的代码中,slave 中的 count
减少为 master 中的 p_count
。请注意 MPI_IN_PLACE
和两个 MPI_Reduce
调用。您可以通过简单地设置 count = 0
并在没有 MPI_IN_PLACE
.
的所有级别调用 MPI_Reduce
来获得相同的功能
for (int i=0; i<3; ++i)
{
p_count = 0;
if (rank == 0) count = 0;
MPI_Reduce(&count, &p_count, 1, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD);
}
假设我有 n 个进程:
他们进行计算,然后将结果发送到排名 0。这就是我想要发生的事情:
等级 0 将等待,直到它从所有等级中获得结果,然后将它们相加。
我该怎么做?另外,我想避免以下情况:
例如。 4 个进程 P0、P1、P2、P3,
P1 -> P0
P2 -> P0
P3 -> P0
同时P1计算完成,P1->P0再次发生
我希望 P0 在一个循环中只对 3 个进程进行加法,然后再为下一个循环进行加法。
有人可以推荐一个 MPI 函数来做这个吗?我知道 MPI_Gather 但我不确定它是否阻塞
我想到了这个:
#include <mpi.h>
#include <stdio.h>
int main()
{
int pross, rank,p_count = 0;
int count = 10;
MPI_Init(&argc,&argv);
MPI_Comm_size(MPI_COMM_WORLD,&pross);
MPI_Comm_rank(MPI_COMM_WORLD,&rank);
int * num = malloc((pross-1)*sizeof(int));
if(rank !=0)
{
MPI_Send(&count,1,MPI_INT,0,1,MPI_COMM_WORLD);
}
else
{
MPI_Gather(&count, 1,MPI_INT,num, 1, MPI_INT, 0,MPI_COMM_WORLD);
for(ii = 0; ii < pross-1;ii++ ){printf("\n NUM %d \n",num[ii]); p_count += num[ii]; }
}
MPI_Finalize();
}
我遇到错误:
*** Process received signal ***
Signal: Segmentation fault (11)
Signal code: Address not mapped (1)
Failing at address: (nil)
[ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x11630)[0x7fb3e3bc3630]
[ 1] /lib/x86_64-linux-gnu/libc.so.6(+0x90925)[0x7fb3e387b925]
[ 2] /usr/lib/libopen-pal.so.13(+0x30177)[0x7fb3e3302177]
[ 3] /usr/lib/libmpi.so.12(ompi_datatype_sndrcv+0x54c)[0x7fb3e3e1e3ec]
[ 4] /usr/lib/openmpi/lib/openmpi/mca_coll_tuned.so(ompi_coll_tuned_gather_intra_basic_linear+0x143)[0x7fb3d53d9063]
[ 5] /usr/lib/libmpi.so.12(PMPI_Gather+0x1ba)[0x7fb3e3e29a3a]
[ 6] sosuks(+0xe83)[0x55ee72119e83]
[ 7] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1)[0x7fb3e380b3f1]
[ 8] sosuks(+0xb5a)[0x55ee72119b5a]
*** End of error message ***
此外,我试过:
#include <mpi.h>
#include <stdio.h>
int main()
{
int pross, rank,p_count = 0;
int count = 10;
MPI_Init(&argc,&argv);
MPI_Comm_size(MPI_COMM_WORLD,&pross);
MPI_Comm_rank(MPI_COMM_WORLD,&rank);
int * num = malloc((pross-1)*sizeof(int));
if(rank !=0)
{
MPI_Send(&count,1,MPI_INT,0,1,MPI_COMM_WORLD);
}
else
{
MPI_Gather(&count, 1,MPI_INT,num, 1, MPI_INT, 0,MPI_COMM_WORLD);
for(ii = 0; ii < pross-1;ii++ ){printf("\n NUM %d \n",num[ii]); p_count += num[ii]; }
}
MPI_Finalize();
}
我在这里遇到错误:
*** Process received signal ***
Signal: Segmentation fault (11)
Signal code: Address not mapped (1)
Failing at address: 0x560600000002
[ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x11630)[0x7fefc8c11630]
[ 1] mdscisuks(+0xeac)[0x5606c1263eac]
[ 2] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1)[0x7fefc88593f1]
[ 3] mdscisuks(+0xb4a)[0x5606c1263b4a]
*** End of error message ***
第二次尝试,这里要注意的是发送和接收成功,但是root因为某些原因只能收到2条来自ranks的消息。看到的分段错误是由于 num 中只有两个元素,我不明白为什么 num 只接收两次 .
我将代码称为
mpiexec -n 6 ./sosuks
有人可以告诉我更好/正确的方法来实现我的想法吗?
更新:
除了下面的答案,我发现我在上面的实现中有错误,我想分享一下:
#include <mpi.h>
#include <stdio.h>
int main()
{
int pross, rank,p_count = 0;
int count = 10;
MPI_Init(&argc,&argv);
MPI_Comm_size(MPI_COMM_WORLD,&pross);
MPI_Comm_rank(MPI_COMM_WORLD,&rank);
MPI_Status status;
int * num = malloc((pross-1)*sizeof(int));
if(rank !=0)
{
MPI_Send(&count,1,MPI_INT,0,1,MPI_COMM_WORLD);
}
else
{
int var,lick = 0;
for(lick = 1; lick < pross; u++)
{
int fetihs;
MPI_Recv(&fetihs,1,MPI_INT,lick,1,MPI_COMM_WORLD,&status);
var += fetihs;
}
// do things with var
}
MPI_Finalize();
}
对于您的情况,正如 Sneftel 指出的那样,您需要 MPI_Reduce
。此外,在循环完成之前不需要显式同步。
#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char* argv[])
{
int pross, rank, p_count, count = 10;
MPI_Init(&argc,&argv);
MPI_Comm_size(MPI_COMM_WORLD, &pross);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
int* num = malloc((pross-1)*sizeof(int));
// master does not send data to itself.
// only workers send data to master.
for (int i=0; i<3; ++i)
{
// to prove that no further sync is needed.
// you will get the same answer in each cycle.
p_count = 0;
if (rank == 0)
{
// this has not effect since master uses p_count for both
// send and receive buffers due to MPI_IN_PLACE.
count = 500;
MPI_Reduce(MPI_IN_PLACE, &p_count, 1, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD);
}
else
{
// for slave p_count is irrelevant.
MPI_Reduce(&count, NULL, 1, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD);
}
if (rank == 0)
{
printf("p_count = %i\n", p_count);
}
// slaves send their data to master before the cycle completes.
// no need for explicit sync such as MPI_Barrier.
// MPI_Barrier(MPI_COMM_WORLD); // no need.
}
MPI_Finalize();
}
在上面的代码中,slave 中的 count
减少为 master 中的 p_count
。请注意 MPI_IN_PLACE
和两个 MPI_Reduce
调用。您可以通过简单地设置 count = 0
并在没有 MPI_IN_PLACE
.
MPI_Reduce
来获得相同的功能
for (int i=0; i<3; ++i)
{
p_count = 0;
if (rank == 0) count = 0;
MPI_Reduce(&count, &p_count, 1, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD);
}