带有 CUDA 安装问题的 Openmpi

Openmpi with CUDA istallation issue

在尝试安装支持 CUDA 的 Openmpi 时,我遇到了一些生成文件失败的问题。

btl_uct_module.c: In function ‘mca_btl_uct_reg_mem’:
btl_uct_module.c:214:22: error: ‘UCT_MD_MEM_ACCESS_REMOTE_GET’ undeclared (first use in this function)
         uct_flags |= UCT_MD_MEM_ACCESS_REMOTE_GET;
                      ^
btl_uct_module.c:214:22: note: each undeclared identifier is reported only once for each function it appears in
btl_uct_module.c:217:22: error: ‘UCT_MD_MEM_ACCESS_REMOTE_PUT’ undeclared (first use in this function)
         uct_flags |= UCT_MD_MEM_ACCESS_REMOTE_PUT;
                      ^
btl_uct_module.c:220:22: error: ‘UCT_MD_MEM_ACCESS_REMOTE_ATOMIC’ undeclared (first use in this function)
         uct_flags |= UCT_MD_MEM_ACCESS_REMOTE_ATOMIC;
                      ^
btl_uct_module.c:225:21: error: ‘UCT_MD_MEM_ACCESS_ALL’ undeclared (first use in this function)
         uct_flags = UCT_MD_MEM_ACCESS_ALL;
                     ^
Makefile:1912: recipe for target 'btl_uct_module.lo' failed
make[2]: *** [btl_uct_module.lo] Error 1
make[2]: Leaving directory '/home/usama/install/openmpi-4.0.1/opal/mca/btl/uct'
Makefile:2375: recipe for target 'all-recursive' failed
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory '/home/usama/install/openmpi-4.0.1/opal'
Makefile:1893: recipe for target 'all-recursive' failed
make: *** [all-recursive] Error 1

我是用下面的命令配置然后安装的

./configure --prefix=/home/$USER/.openmpi --with-cuda
make all install

我正在使用以下配置:

Ubuntu 16.04

Cuda 10.1

CuDNN 7.5

Openmpi 4.0.1

奇怪的是,我试图在我的本地机器上进行相同的安装,其中有 Ubuntu 18.04,它安装并运行良好。这是一些兼容性问题吗?有什么想法吗?

原来是兼容性问题。使用 openmpi 3.1.4 解决了问题。