当我们有 2 个 NVIDIA/CUDA 带有 NVLink 硬件组件的 GPU 卡时如何正确组合 GPU
How to properly combine GPU's when we have 2 NVIDIA/CUDA GPU cards with NVLink hardware component
在 Debian 10 上,我有 2 个带有 NVlink 硬件组件的 GPU 卡 RTX A6000,我想利用这两个卡的潜在综合能力。
目前,我有以下 magma.make 由 Makefile 调用:
CXX = nvcc -std=c++17 -O3
LAPACK = /opt/intel/oneapi/mkl/latest
LAPACK_ANOTHER=/opt/intel/mkl/lib/intel64
MAGMA = /usr/local/magma
INCLUDE_CUDA=/usr/local/cuda/include
LIBCUDA=/usr/local/cuda/lib64
SEARCH_DIRS_INCL=-I${MAGMA}/include -I${INCLUDE_CUDA} -I${LAPACK}/include
SEARCH_DIRS_LINK=-L${LAPACK}/lib/intel64 -L${LAPACK_ANOTHER} -L${LIBCUDA} -L${MAGMA}/lib
CXXFLAGS = -c -DMAGMA_ILP64 -DMKL_ILP64 -m64 ${SEARCH_DIRS_INCL}
LDFLAGS = ${SEARCH_DIRS_LINK} -lmkl_intel_lp64 -lmkl_gnu_thread -lmkl_core -lgomp -lcuda -lcudart -lcublas -lmagma -lpthread -lm -ldl -Xnvlink
SOURCES = main_magma.cpp XSAF_C_magma.cpp
EXECUTABLE = main_magma.exe
如您所见,我使用了最后一个标志 -Xnvlink
但它在编译时产生了以下错误:
/usr/bin/ld: /usr/lib/gcc/x86_64-linux-gnu/8/../../../x86_64-linux-gnu/Scrt1.o: in function `_start':
(.text+0x20): undefined reference to `main'
collect2: error: ld returned 1 exit status
make: *** [Makefile:10: main_magma.exe] Error 1
如何使用正确的标志或选项在可执行文件中包含 2 个 GPU 与 NVLink 的组合功率调用?
编辑:
一位 HPC 工程师告诉我:
"The easiest way will be to use the Makefiles until we figure out how
cmake can support that. If you do that, you can just replace
LAPACKE_dgetrf by magma_dgetrf. MAGMA will use internally one GPU with
out-of-memory algorithm that fill factor the matrix, even if it is
large and does not fir into the memory of the GPU."
这是否意味着我必须找到 Makefile 的适当标志才能使用 magma_dgetrf 而不是 LAPACKE_dgetrf?
而对于第二句,据说
"MAGMA will use internally one GPU with out-of-memory algorithm that
fill factor the matrix"
是否意味着如果我的矩阵
超过 48GB,那么 MAGMA 将能够将其余部分填充到第二个 GPU A6000 或 RAM 中并执行完整矩阵的反转?
请让我知道在我的情况下使用哪些标志来正确构建 MAGMA。
目前,我是:
$ mkdir build && cd build
$ cmake -DUSE_FORTRAN=ON \
-DGPU_TARGET=Ampere \
-DLAPACK_LIBRARIES="/opt/intel/oneapi/intelpython/latest/lib/liblapack.so" \
-DMAGMA_ENABLE_CUDA=ON ..
$ cmake --build . --config Release
I have use the last flag -Xnvlink ...
我们来咨询一下some documentation:
The following table lists some useful nvlink options which can be specified with nvcc option --nvlink-options.
4.2.9.2.1. --disable-warnings (-w)
Inhibit all warning messages.
4.2.9.2.2. --preserve-relocs (-preserve-relocs)
Preserve resolved relocations in linked executable.
4.2.9.2.3. --verbose (-v)
Enable verbose mode which prints code generation statistics.
4.2.9.2.4. --warning-as-error (-Werror)
Make all warnings into errors.
4.2.9.2.5. --suppress-arch-warning (-suppress-arch-warning)
Suppress the warning that otherwise is printed when object does not contain code for target arch.
4.2.9.2.6. --suppress-stack-size-warning (-suppress-stack-size-warning)
Suppress the warning that otherwise is printed when stack size cannot be determined.
4.2.9.2.7. --dump-callgraph (-dump-callgraph)
Dump information about the callgraph and register usage.
从文中应该可以看出这个选项是用来控制编译时的device linker行为的,其中none与NVLINK有什么关系,是一种硬件互连技术.
How to use the right flag or options to include in the executable the combined power calls of 2 GPU with NVLink ?
没有标志或选项。没有编译器辅助的多 GPU 支持。您必须编写自己的多 GPU 代码,或者使用别人为您编写的库。如果这样的多 GPU 代码存在于您的可执行文件中,它将在编译期间无需任何特殊的编译器选项或标志即可运行。
在 Debian 10 上,我有 2 个带有 NVlink 硬件组件的 GPU 卡 RTX A6000,我想利用这两个卡的潜在综合能力。
目前,我有以下 magma.make 由 Makefile 调用:
CXX = nvcc -std=c++17 -O3
LAPACK = /opt/intel/oneapi/mkl/latest
LAPACK_ANOTHER=/opt/intel/mkl/lib/intel64
MAGMA = /usr/local/magma
INCLUDE_CUDA=/usr/local/cuda/include
LIBCUDA=/usr/local/cuda/lib64
SEARCH_DIRS_INCL=-I${MAGMA}/include -I${INCLUDE_CUDA} -I${LAPACK}/include
SEARCH_DIRS_LINK=-L${LAPACK}/lib/intel64 -L${LAPACK_ANOTHER} -L${LIBCUDA} -L${MAGMA}/lib
CXXFLAGS = -c -DMAGMA_ILP64 -DMKL_ILP64 -m64 ${SEARCH_DIRS_INCL}
LDFLAGS = ${SEARCH_DIRS_LINK} -lmkl_intel_lp64 -lmkl_gnu_thread -lmkl_core -lgomp -lcuda -lcudart -lcublas -lmagma -lpthread -lm -ldl -Xnvlink
SOURCES = main_magma.cpp XSAF_C_magma.cpp
EXECUTABLE = main_magma.exe
如您所见,我使用了最后一个标志 -Xnvlink
但它在编译时产生了以下错误:
/usr/bin/ld: /usr/lib/gcc/x86_64-linux-gnu/8/../../../x86_64-linux-gnu/Scrt1.o: in function `_start':
(.text+0x20): undefined reference to `main'
collect2: error: ld returned 1 exit status
make: *** [Makefile:10: main_magma.exe] Error 1
如何使用正确的标志或选项在可执行文件中包含 2 个 GPU 与 NVLink 的组合功率调用?
编辑:
一位 HPC 工程师告诉我:
"The easiest way will be to use the Makefiles until we figure out how cmake can support that. If you do that, you can just replace LAPACKE_dgetrf by magma_dgetrf. MAGMA will use internally one GPU with out-of-memory algorithm that fill factor the matrix, even if it is large and does not fir into the memory of the GPU."
这是否意味着我必须找到 Makefile 的适当标志才能使用 magma_dgetrf 而不是 LAPACKE_dgetrf?
而对于第二句,据说
"MAGMA will use internally one GPU with out-of-memory algorithm that fill factor the matrix"
是否意味着如果我的矩阵 超过 48GB,那么 MAGMA 将能够将其余部分填充到第二个 GPU A6000 或 RAM 中并执行完整矩阵的反转?
请让我知道在我的情况下使用哪些标志来正确构建 MAGMA。
目前,我是:
$ mkdir build && cd build
$ cmake -DUSE_FORTRAN=ON \
-DGPU_TARGET=Ampere \
-DLAPACK_LIBRARIES="/opt/intel/oneapi/intelpython/latest/lib/liblapack.so" \
-DMAGMA_ENABLE_CUDA=ON ..
$ cmake --build . --config Release
I have use the last flag -Xnvlink ...
我们来咨询一下some documentation:
The following table lists some useful nvlink options which can be specified with nvcc option --nvlink-options.
4.2.9.2.1. --disable-warnings (-w)
Inhibit all warning messages.
4.2.9.2.2. --preserve-relocs (-preserve-relocs)
Preserve resolved relocations in linked executable.
4.2.9.2.3. --verbose (-v)
Enable verbose mode which prints code generation statistics.
4.2.9.2.4. --warning-as-error (-Werror)
Make all warnings into errors.
4.2.9.2.5. --suppress-arch-warning (-suppress-arch-warning)
Suppress the warning that otherwise is printed when object does not contain code for target arch.
4.2.9.2.6. --suppress-stack-size-warning (-suppress-stack-size-warning)
Suppress the warning that otherwise is printed when stack size cannot be determined.
4.2.9.2.7. --dump-callgraph (-dump-callgraph)
Dump information about the callgraph and register usage.
从文中应该可以看出这个选项是用来控制编译时的device linker行为的,其中none与NVLINK有什么关系,是一种硬件互连技术.
How to use the right flag or options to include in the executable the combined power calls of 2 GPU with NVLink ?
没有标志或选项。没有编译器辅助的多 GPU 支持。您必须编写自己的多 GPU 代码,或者使用别人为您编写的库。如果这样的多 GPU 代码存在于您的可执行文件中,它将在编译期间无需任何特殊的编译器选项或标志即可运行。