当我们有 2 个 NVIDIA/CUDA 带有 NVLink 硬件组件的 GPU 卡时如何正确组合 GPU

Question

在 Debian 10 上，我有 2 个带有 NVlink 硬件组件的 GPU 卡 RTX A6000，我想利用这两个卡的潜在综合能力。

目前，我有以下 magma.make 由 Makefile 调用：

CXX = nvcc -std=c++17 -O3
LAPACK = /opt/intel/oneapi/mkl/latest
LAPACK_ANOTHER=/opt/intel/mkl/lib/intel64
MAGMA = /usr/local/magma
INCLUDE_CUDA=/usr/local/cuda/include
LIBCUDA=/usr/local/cuda/lib64

SEARCH_DIRS_INCL=-I${MAGMA}/include -I${INCLUDE_CUDA} -I${LAPACK}/include
SEARCH_DIRS_LINK=-L${LAPACK}/lib/intel64 -L${LAPACK_ANOTHER} -L${LIBCUDA} -L${MAGMA}/lib

CXXFLAGS = -c -DMAGMA_ILP64 -DMKL_ILP64 -m64 ${SEARCH_DIRS_INCL}

LDFLAGS = ${SEARCH_DIRS_LINK} -lmkl_intel_lp64 -lmkl_gnu_thread -lmkl_core -lgomp -lcuda -lcudart -lcublas -lmagma -lpthread -lm -ldl -Xnvlink

SOURCES = main_magma.cpp XSAF_C_magma.cpp
EXECUTABLE = main_magma.exe

如您所见，我使用了最后一个标志 -Xnvlink 但它在编译时产生了以下错误：

/usr/bin/ld: /usr/lib/gcc/x86_64-linux-gnu/8/../../../x86_64-linux-gnu/Scrt1.o: in function `_start':
(.text+0x20): undefined reference to `main'
collect2: error: ld returned 1 exit status
make: *** [Makefile:10: main_magma.exe] Error 1

如何使用正确的标志或选项在可执行文件中包含 2 个 GPU 与 NVLink 的组合功率调用？

编辑：

一位 HPC 工程师告诉我：

"The easiest way will be to use the Makefiles until we figure out how cmake can support that. If you do that, you can just replace LAPACKE_dgetrf by magma_dgetrf. MAGMA will use internally one GPU with out-of-memory algorithm that fill factor the matrix, even if it is large and does not fir into the memory of the GPU."

这是否意味着我必须找到 Makefile 的适当标志才能使用 magma_dgetrf 而不是 LAPACKE_dgetrf？

而对于第二句，据说

"MAGMA will use internally one GPU with out-of-memory algorithm that fill factor the matrix"

是否意味着如果我的矩阵超过 48GB，那么 MAGMA 将能够将其余部分填充到第二个 GPU A6000 或 RAM 中并执行完整矩阵的反转？

请让我知道在我的情况下使用哪些标志来正确构建 MAGMA。

目前，我是：

$ mkdir build && cd build
$ cmake -DUSE_FORTRAN=ON  \
-DGPU_TARGET=Ampere \
-DLAPACK_LIBRARIES="/opt/intel/oneapi/intelpython/latest/lib/liblapack.so" \
-DMAGMA_ENABLE_CUDA=ON ..
$ cmake --build . --config Release

Answer 1

I have use the last flag -Xnvlink ...

我们来咨询一下some documentation:

The following table lists some useful nvlink options which can be specified with nvcc option --nvlink-options.

4.2.9.2.1. --disable-warnings (-w)
Inhibit all warning messages.

4.2.9.2.2. --preserve-relocs (-preserve-relocs)
Preserve resolved relocations in linked executable.

4.2.9.2.3. --verbose (-v)
Enable verbose mode which prints code generation statistics.

4.2.9.2.4. --warning-as-error (-Werror)
Make all warnings into errors.

4.2.9.2.5. --suppress-arch-warning (-suppress-arch-warning)
Suppress the warning that otherwise is printed when object does not contain code for target arch.

4.2.9.2.6. --suppress-stack-size-warning (-suppress-stack-size-warning)
Suppress the warning that otherwise is printed when stack size cannot be determined.

4.2.9.2.7. --dump-callgraph (-dump-callgraph)
Dump information about the callgraph and register usage.

从文中应该可以看出这个选项是用来控制编译时的device linker行为的，其中none与NVLINK有什么关系，是一种硬件互连技术.

How to use the right flag or options to include in the executable the combined power calls of 2 GPU with NVLink ?

没有标志或选项。没有编译器辅助的多 GPU 支持。您必须编写自己的多 GPU 代码，或者使用别人为您编写的库。如果这样的多 GPU 代码存在于您的可执行文件中，它将在编译期间无需任何特殊的编译器选项或标志即可运行。

当我们有 2 个 NVIDIA/CUDA 带有 NVLink 硬件组件的 GPU 卡时如何正确组合 GPU

How to properly combine GPU's when we have 2 NVIDIA/CUDA GPU cards with NVLink hardware component

compiler-errors

compilation

nvcc

lapack

magma

编辑：