如何强制 mpi4py 安装在集群上使用 gnu MPI 而不是 intel MPI

How to force mpi4py installation to use gnu MPI instead of intel MPI on cluster

问题

我正在尝试在集群上使用 mpi4py。由于其他依赖项,我必须使用 gnu,而不是 intel。但是,集群上有两个编译器版本,我无法强制mpi4py 与gnu 编译器一起安装

问题

失败尝试

我首先尝试卸载 intel 模块并加载 gnu openmpi 模块,这样我得到:

me@cluster:~$ module purge
me@cluster:~$ module load python
me@cluster:~$ source .virtualenvs/py36env/bin/activate
(py36env) me@cluster:~$ module load openmpi/gcc/9.1/4.0.1 
(py36env) me@cluster:~$ module list
Currently Loaded Modulefiles:
  1) python/3.6              2) openmpi/gcc/9.1/4.0.1

(py36env) me@cluster:~$ which mpicc
/usr/local/Cluster-Apps/openmpi/gnu/4.0.1-gcc-9.1/bin/mpicc

(py36env) me@cluster:~$ mpicc --version
gcc (GCC) 9.1.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

(py36env) me@cluster:~$ which mpirun
/usr/local/Cluster-Apps/openmpi/gnu/4.0.1-gcc-9.1/bin/mpirun

(py36env) me@cluster:~$ mpirun --version
mpirun (Open MPI) 4.0.1

Report bugs to http://www.open-mpi.org/community/help/

但是,当我 pip install mpi4py 时,它使用了英特尔编译器,尽管我努力卸载它们:

(py36env) me@cluster:~$ pip install mpi4py
Collecting mpi4py
Installing collected packages: mpi4py
Successfully installed mpi4py-3.0.2

(py36env) me@cluster:~$ python -c "import mpi4py; print(mpi4py.get_config())"
{'mpicc': '/usr/local/Cluster-Apps/intel/2017.4/compilers_and_libraries_2017.4.196/linux/mpi/intel64/bin/mpicc', 
 'mpicxx': '/usr/local/Cluster-Apps/intel/2017.4/compilers_and_libraries_2017.4.196/linux/mpi/intel64/bin/mpicxx', 
 'mpifort': '/usr/local/Cluster-Apps/intel/2017.4/compilers_and_libraries_2017.4.196/linux/mpi/intel64/bin/mpif90', 
 'mpif90': '/usr/local/Cluster-Apps/intel/2017.4/compilers_and_libraries_2017.4.196/linux/mpi/intel64/bin/mpif90', 
 'mpif77': '/usr/local/Cluster-Apps/intel/2017.4/compilers_and_libraries_2017.4.196/linux/mpi/intel64/bin/mpif77'}

即使我尝试使用

指定 mpi 环境,我也会得到相同的结果
$ env MPICC=/usr/local/Cluster-Apps/openmpi/gnu/4.0.1-gcc-9.1/bin/mpicc pip install mpi4py

https://mpi4py.readthedocs.io/en/stable/install.html#using-pip-or-easy-install 的注释中所建议。

错误

正如在对“mpiexec and python mpi4py gives rank 0 and size 1”的评论中所指出的,针对与使用的 mpirun 不同的 MPI 实现构建 mpi4py 会导致错误:

(py36env) me@login-e-11:~$ mpirun -n 5 python -m mpi4py.bench helloworld
Hello, World! I am process 0 of 1 on login-e-11.
Hello, World! I am process 0 of 1 on login-e-11.
Hello, World! I am process 0 of 1 on login-e-11.
Hello, World! I am process 0 of 1 on login-e-11.
Hello, World! I am process 0 of 1 on login-e-11.

这实际上应该是(参见 https://mpi4py.readthedocs.io/en/stable/install.html#testing):

$ mpirun -n 5 python -m mpi4py.bench helloworld
Hello, World! I am process 0 of 5 on localhost.
Hello, World! I am process 1 of 5 on localhost.
Hello, World! I am process 2 of 5 on localhost.
Hello, World! I am process 3 of 5 on localhost.
Hello, World! I am process 4 of 5 on localhost.

安装包时,pip 会缓存最近构建的包。为了避免使用缓存,强制 pip 重建包,从而选择正确的环境,可以使用 pip install--no-cache-dir 选项。

请参阅 horovod 的文档 here 以进行进一步讨论。