OpenMPI -host 和 -hostfile 选项
OpenMPI -host and -hostfile options
使用 OpenMPI v2,当我 运行 使用 -host
测试程序时,它可以工作。我的意思是,该过程跨越我指定的主机。但是,当我指定 -hostfile
时,它不起作用!!
mahmood@cluster:mpitest$ /share/apps/computer/openmpi-2.0.1/bin/mpirun -host compute-0-0,cluster -np 2 a.out
****************************************************************************
* hwloc 1.11.2 has encountered what looks like an error from the operating system.
*
* Package (P#1 cpuset 0xffff0000) intersects with NUMANode (P#1 cpuset 0xff00ffff) without inclusion!
* Error occurred in topology.c line 1048
*
* The following FAQ entry in the hwloc documentation may help:
* What should I do when hwloc reports "operating system" warnings?
* Otherwise please report this error message to the hwloc user's mailing list,
* along with the output+tarball generated by the hwloc-gather-topology script.
****************************************************************************
Hello world from processor cluster.hpc.org, rank 1 out of 2 processors
Hello world from processor compute-0-0.local, rank 0 out of 2 processors
mahmood@cluster:mpitest$ cat hosts
cluster
compute-0-0
mahmood@cluster:mpitest$ /share/apps/computer/openmpi-2.0.1/bin/mpirun -hostfile hosts -np 2 a.out
****************************************************************************
* hwloc 1.11.2 has encountered what looks like an error from the operating system.
*
* Package (P#1 cpuset 0xffff0000) intersects with NUMANode (P#1 cpuset 0xff00ffff) without inclusion!
* Error occurred in topology.c line 1048
*
* The following FAQ entry in the hwloc documentation may help:
* What should I do when hwloc reports "operating system" warnings?
* Otherwise please report this error message to the hwloc user's mailing list,
* along with the output+tarball generated by the hwloc-gather-topology script.
****************************************************************************
Hello world from processor cluster.hpc.org, rank 0 out of 2 processors
Hello world from processor cluster.hpc.org, rank 1 out of 2 processors
这是什么问题,我该如何解决?
-host
参数中列出的主机各提供一个插槽,因此 -host A,B
表示主机 A
上的一个插槽和主机 B
.[=19 上的一个插槽=]
强制mpiexec
为每个节点启动N个进程,使用以下选项
--map-by ppr:N:node
在您的情况下,对于每个节点一个进程,它应该是 --map-by ppr:1:node
。或者,您可以通过将主机文件修改为如下所示,将每个主机的插槽数限制为一个:
cluster slots=1 max_slots=1
compute-0-0 slots=1 max_slots=1
(如果未提供,slots=1
应该是默认值...)
使用 OpenMPI v2,当我 运行 使用 -host
测试程序时,它可以工作。我的意思是,该过程跨越我指定的主机。但是,当我指定 -hostfile
时,它不起作用!!
mahmood@cluster:mpitest$ /share/apps/computer/openmpi-2.0.1/bin/mpirun -host compute-0-0,cluster -np 2 a.out
****************************************************************************
* hwloc 1.11.2 has encountered what looks like an error from the operating system.
*
* Package (P#1 cpuset 0xffff0000) intersects with NUMANode (P#1 cpuset 0xff00ffff) without inclusion!
* Error occurred in topology.c line 1048
*
* The following FAQ entry in the hwloc documentation may help:
* What should I do when hwloc reports "operating system" warnings?
* Otherwise please report this error message to the hwloc user's mailing list,
* along with the output+tarball generated by the hwloc-gather-topology script.
****************************************************************************
Hello world from processor cluster.hpc.org, rank 1 out of 2 processors
Hello world from processor compute-0-0.local, rank 0 out of 2 processors
mahmood@cluster:mpitest$ cat hosts
cluster
compute-0-0
mahmood@cluster:mpitest$ /share/apps/computer/openmpi-2.0.1/bin/mpirun -hostfile hosts -np 2 a.out
****************************************************************************
* hwloc 1.11.2 has encountered what looks like an error from the operating system.
*
* Package (P#1 cpuset 0xffff0000) intersects with NUMANode (P#1 cpuset 0xff00ffff) without inclusion!
* Error occurred in topology.c line 1048
*
* The following FAQ entry in the hwloc documentation may help:
* What should I do when hwloc reports "operating system" warnings?
* Otherwise please report this error message to the hwloc user's mailing list,
* along with the output+tarball generated by the hwloc-gather-topology script.
****************************************************************************
Hello world from processor cluster.hpc.org, rank 0 out of 2 processors
Hello world from processor cluster.hpc.org, rank 1 out of 2 processors
这是什么问题,我该如何解决?
-host
参数中列出的主机各提供一个插槽,因此 -host A,B
表示主机 A
上的一个插槽和主机 B
.[=19 上的一个插槽=]
强制mpiexec
为每个节点启动N个进程,使用以下选项
--map-by ppr:N:node
在您的情况下,对于每个节点一个进程,它应该是 --map-by ppr:1:node
。或者,您可以通过将主机文件修改为如下所示,将每个主机的插槽数限制为一个:
cluster slots=1 max_slots=1
compute-0-0 slots=1 max_slots=1
(如果未提供,slots=1
应该是默认值...)