mpirun:无法识别的参数 mca
mpirun: Unrecognized argument mca
我有一个 C++ 求解器,我需要使用以下命令并行 运行:
nohup mpirun -np 16 ./my_exec > log.txt &
此命令将 运行 my_exec
独立于我节点上可用的 16 个处理器。这曾经完美地工作。
上周,HPC 部门进行了 OS 升级,现在,当启动相同的命令时,我收到两条警告消息(针对每个处理器)。第一个是:
--------------------------------------------------------------------------
2 WARNING: It appears that your OpenFabrics subsystem is configured to only
3 allow registering part of your physical memory. This can cause MPI jobs to
4 run with erratic performance, hang, and/or crash.
5
6 This may be caused by your OpenFabrics vendor limiting the amount of
7 physical memory that can be registered. You should investigate the
8 relevant Linux kernel module parameters that control how much physical
9 memory can be registered, and increase them to allow registering all
10 physical memory on your machine.
11
12 See this Open MPI FAQ item for more information on these Linux kernel module
13 parameters:
14
15 http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages
16
17 Local host: tamnun
18 Registerable memory: 32768 MiB
19 Total memory: 98294 MiB
20
21 Your MPI job will continue, but may be behave poorly and/or hang.
22 --------------------------------------------------------------------------
23 --------------------------------------------------------------------------
然后我从我的代码中得到一个输出,它告诉我它认为我只启动了代码的 1 个实现(Nprocs
= 1 而不是 16)。
177
178 # MPI IS ON; Nprocs = 1
179 Filename = ../input/odtParam.inp
180
181 # MPI IS ON; Nprocs = 1
182
183 ***** Error, process 0 failed to create ../data/data_0/, or it was already there
最后,第二条警告信息是:
185 --------------------------------------------------------------------------
186 An MPI process has executed an operation involving a call to the
187 "fork()" system call to create a child process. Open MPI is currently
188 operating in a condition that could result in memory corruption or
189 other system errors; your MPI job may hang, crash, or produce silent
190 data corruption. The use of fork() (or system() or other calls that
191 create child processes) is strongly discouraged.
192
193 The process that invoked fork was:
194
195 Local host: tamnun (PID 17446)
196 MPI_COMM_WORLD rank: 0
197
198 If you are *absolutely sure* that your application will successfully
199 and correctly survive a call to fork(), you may disable this warning
200 by setting the mpi_warn_on_fork MCA parameter to 0.
201 --------------------------------------------------------------------------
在线查看后,我尝试按照警告消息的建议,使用以下命令将 MCA
参数 mpi_warn_on_fork
设置为 0:
nohup mpirun --mca mpi_warn_on_fork 0 -np 16 ./my_exec > log.txt &
产生以下错误消息:
[mpiexec@tamnun] match_arg (./utils/args/args.c:194): unrecognized argument mca
[mpiexec@tamnun] HYDU_parse_array (./utils/args/args.c:214): argument matching returned error
[mpiexec@tamnun] parse_args (./ui/mpich/utils.c:2964): error parsing input array
[mpiexec@tamnun] HYD_uii_mpx_get_parameters (./ui/mpich/utils.c:3238): unable to parse user arguments
我正在使用 RedHat 6.7 (Santiago)。我联系了 HPC 部门,但由于我在大学,这个问题可能需要一两天才能回复。任何帮助或指导将不胜感激。
编辑以响应答案:
确实,我使用 Open MPI 的 mpic++
编译我的代码,而 运行 使用 Intel 的 mpirun
命令编译可执行文件,因此出现错误(在 OS 升级之后Intel 的 mpirun
被设置为默认值)。我必须将 Open MPI 的 mpirun
路径放在 $PATH
环境变量的开头。
代码现在 运行 符合预期,但我仍然收到上面的第一条警告消息(它不建议我再使用 MCA
参数 mpi_warn_on_fork
。我认为(但不确定)这是我需要与 HPC 部门解决的问题。
[mpiexec@tamnun] match_arg (./utils/args/args.c:194): unrecognized argument mca
[mpiexec@tamnun] HYDU_parse_array (./utils/args/args.c:214): argument matching returned error
[mpiexec@tamnun] parse_args (./ui/mpich/utils.c:2964): error parsing input array
^^^^^
[mpiexec@tamnun] HYD_uii_mpx_get_parameters (./ui/mpich/utils.c:3238): unable to parse user arguments
^^^^^
您在最后一个案例中使用的是 MPICH。 MPICH 不是 Open MPI,它的进程启动器不识别特定于 Open MPI 的 --mca
参数(MCA 代表模块化组件架构 - Open MPI 构建的基本框架)。混合多个 MPI 实现的典型案例。
我有一个 C++ 求解器,我需要使用以下命令并行 运行:
nohup mpirun -np 16 ./my_exec > log.txt &
此命令将 运行 my_exec
独立于我节点上可用的 16 个处理器。这曾经完美地工作。
上周,HPC 部门进行了 OS 升级,现在,当启动相同的命令时,我收到两条警告消息(针对每个处理器)。第一个是:
--------------------------------------------------------------------------
2 WARNING: It appears that your OpenFabrics subsystem is configured to only
3 allow registering part of your physical memory. This can cause MPI jobs to
4 run with erratic performance, hang, and/or crash.
5
6 This may be caused by your OpenFabrics vendor limiting the amount of
7 physical memory that can be registered. You should investigate the
8 relevant Linux kernel module parameters that control how much physical
9 memory can be registered, and increase them to allow registering all
10 physical memory on your machine.
11
12 See this Open MPI FAQ item for more information on these Linux kernel module
13 parameters:
14
15 http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages
16
17 Local host: tamnun
18 Registerable memory: 32768 MiB
19 Total memory: 98294 MiB
20
21 Your MPI job will continue, but may be behave poorly and/or hang.
22 --------------------------------------------------------------------------
23 --------------------------------------------------------------------------
然后我从我的代码中得到一个输出,它告诉我它认为我只启动了代码的 1 个实现(Nprocs
= 1 而不是 16)。
177
178 # MPI IS ON; Nprocs = 1
179 Filename = ../input/odtParam.inp
180
181 # MPI IS ON; Nprocs = 1
182
183 ***** Error, process 0 failed to create ../data/data_0/, or it was already there
最后,第二条警告信息是:
185 --------------------------------------------------------------------------
186 An MPI process has executed an operation involving a call to the
187 "fork()" system call to create a child process. Open MPI is currently
188 operating in a condition that could result in memory corruption or
189 other system errors; your MPI job may hang, crash, or produce silent
190 data corruption. The use of fork() (or system() or other calls that
191 create child processes) is strongly discouraged.
192
193 The process that invoked fork was:
194
195 Local host: tamnun (PID 17446)
196 MPI_COMM_WORLD rank: 0
197
198 If you are *absolutely sure* that your application will successfully
199 and correctly survive a call to fork(), you may disable this warning
200 by setting the mpi_warn_on_fork MCA parameter to 0.
201 --------------------------------------------------------------------------
在线查看后,我尝试按照警告消息的建议,使用以下命令将 MCA
参数 mpi_warn_on_fork
设置为 0:
nohup mpirun --mca mpi_warn_on_fork 0 -np 16 ./my_exec > log.txt &
产生以下错误消息:
[mpiexec@tamnun] match_arg (./utils/args/args.c:194): unrecognized argument mca
[mpiexec@tamnun] HYDU_parse_array (./utils/args/args.c:214): argument matching returned error
[mpiexec@tamnun] parse_args (./ui/mpich/utils.c:2964): error parsing input array
[mpiexec@tamnun] HYD_uii_mpx_get_parameters (./ui/mpich/utils.c:3238): unable to parse user arguments
我正在使用 RedHat 6.7 (Santiago)。我联系了 HPC 部门,但由于我在大学,这个问题可能需要一两天才能回复。任何帮助或指导将不胜感激。
编辑以响应答案:
确实,我使用 Open MPI 的 mpic++
编译我的代码,而 运行 使用 Intel 的 mpirun
命令编译可执行文件,因此出现错误(在 OS 升级之后Intel 的 mpirun
被设置为默认值)。我必须将 Open MPI 的 mpirun
路径放在 $PATH
环境变量的开头。
代码现在 运行 符合预期,但我仍然收到上面的第一条警告消息(它不建议我再使用 MCA
参数 mpi_warn_on_fork
。我认为(但不确定)这是我需要与 HPC 部门解决的问题。
[mpiexec@tamnun] match_arg (./utils/args/args.c:194): unrecognized argument mca
[mpiexec@tamnun] HYDU_parse_array (./utils/args/args.c:214): argument matching returned error
[mpiexec@tamnun] parse_args (./ui/mpich/utils.c:2964): error parsing input array
^^^^^
[mpiexec@tamnun] HYD_uii_mpx_get_parameters (./ui/mpich/utils.c:3238): unable to parse user arguments
^^^^^
您在最后一个案例中使用的是 MPICH。 MPICH 不是 Open MPI,它的进程启动器不识别特定于 Open MPI 的 --mca
参数(MCA 代表模块化组件架构 - Open MPI 构建的基本框架)。混合多个 MPI 实现的典型案例。