在单 GPU 上将 MPI 限制为运行，即使我们有单节点多 GPU 设置

Question

我是分布式计算的新手，我正在尝试运行一个使用 MPI 和 ROCm 的程序（AMD 框架到运行在 GPU 上）。

我对运行程序使用的命令是 mpirun -np 4 ./a.out

但它默认运行在我机器上可用的 2 个 GPU 上。有没有办法让它运行只在单个 GPU 上，如果是的话怎么做？

提前致谢:)

Answer 1

您可以通过设置一些环境变量来控制活动的 GPU （例如 GPU_DEVICE_ORDINAL、ROCR_VISIBLE_DEVICES 或 HIP_VISIBLE_DEVICES，有关详细信息，请参阅 this or this。

例如：

export HIP_VISIBLE_DEVICES=0
mpirun -np 4 ./a.out
# or 
HIP_VISIBLE_DEVICES=0 mpirun -np 4 ./a.out

请注意，某些 MPI 实现不会导出所有环境变量，否则可能会重新加载您的 bashrc 或 cshrc。所以使用 MPI 的语法来设置 envvars 更安全：

# with openmpi 
mpirun -x HIP_VISIBLE_DEVICES=0 -np 4 ./a.out

# or with mpich
mpiexec -env HIP_VISIBLE_DEVICES 0 -n 4 ./a.out

为了安全起见，将此添加到您的 C++ 代码中可能是个好主意：

#include <stdlib.h>
// ...
char* hip_visible_devices = getenv("HIP_VISIBLE_DEVICES");
if (hip_visible_devices) std::cout << "Running on GPUs: " << hip_visible_devices << std::endl;
else std::cout << "Running on all GPUs! " << std::endl;

（注意 cuda 有一个 envvar 和一个 C 函数 CudaSetDevice(id)，我想知道是否有 AMD 或 openCL 的等效项）。

在单 GPU 上将 MPI 限制为运行，即使我们有单节点多 GPU 设置

Limit MPI to run on single GPU even if we have single Node multi GPU setup

hpc

gpgpu

distributed-computing

mpi

amd-rocm

在单 GPU 上将 MPI 限制为 运行，即使我们有单节点多 GPU 设置

Limit MPI to run on single GPU even if we have single Node multi GPU setup

hpc

gpgpu

distributed-computing

mpi

amd-rocm

在单 GPU 上将 MPI 限制为运行，即使我们有单节点多 GPU 设置