在 Jetson Xavier 上构建 Tensorflow 找不到 CUDA
Building Tensorflow on Jetson Xavier fails to find CUDA
我正在尝试在 docker 图像中为 Xavier 编译 tensorflow 2.3 C API。我使用 this 作为基础 docker 映像,它似乎安装了正确版本的 CUDA,但构建失败并显示以下消息:
ERROR: no such package '@local_config_cuda//cuda': Traceback (most recent call last):
#9 51.98 File "/tensorflow/third_party/gpus/cuda_configure.bzl", line 1369
#9 51.98 _create_local_cuda_repository(<1 more arguments>)
#9 51.98 File "/tensorflow/third_party/gpus/cuda_configure.bzl", line 955, in _create_local_cuda_repository
#9 51.98 _get_cuda_config(repository_ctx, <1 more arguments>)
#9 51.98 File "/tensorflow/third_party/gpus/cuda_configure.bzl", line 657, in _get_cuda_config
#9 51.98 find_cuda_config(repository_ctx, <2 more arguments>)
#9 51.98 File "/tensorflow/third_party/gpus/cuda_configure.bzl", line 635, in find_cuda_config
#9 51.98 _exec_find_cuda_config(<3 more arguments>)
#9 51.98 File "/tensorflow/third_party/gpus/cuda_configure.bzl", line 629, in _exec_find_cuda_config
#9 51.98 execute(repository_ctx, <1 more arguments>)
#9 51.98 File "/tensorflow/third_party/remote_config/common.bzl", line 208, in execute
#9 51.98 fail(<1 more arguments>)
#9 51.98 Repository command failed
#9 51.98 Could not find any libcudart.so.10* in any subdirectory:
#9 51.98 ''
#9 51.98 'lib64'
#9 51.98 'lib'
#9 51.98 'lib/*-linux-gnu'
#9 51.98 'lib/x64'
#9 51.98 'extras/CUPTI/*'
#9 51.98 of:
#9 51.98 '/usr/local/cuda-10.2'
以下是我的Dockerfile的相关部分供参考:
FROM nvcr.io/nvidia/l4t-base:r32.5.0
# ... setup bazel etc
# Tensorflow
ENV TF_NEED_CUDA=1 \
GCC_HOST_COMPILER_PATH=/usr/bin/gcc \
TF_CUDA_VERSION=10.2 \
CUDA_TOOLKIT_PATH=/usr/local/cuda-10.2 \
TF_CUDNN_VERSION=8 \
CUDNN_INSTALL_PATH=/usr/local/cuda-10.2 \
TF_CUDA_COMPUTE_CAPABILITIES=7.2,7.5 \
CC_OPT_FLAGS="--copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-msse4.2 --copt=-mfpmath=both --config=cuda" \
PYTHON_BIN_PATH="/usr/bin/python" \
USE_DEFAULT_PYTHON_LIB_PATH=1 \
TF_NEED_JEMALLOC=1 \
TF_NEED_GCP=0 \
TF_NEED_HDFS=0 \
TF_ENABLE_XLA=0 \
TF_NEED_OPENCL=0
RUN cd / && git clone https://github.com/tensorflow/tensorflow
# The bazel build in the next line fails
RUN cd /tensorflow && git checkout r2.3 && bazel build -c opt //tensorflow/tools/lib_package:libtensorflow
我是否遗漏了一些编译选项,或者我是否必须执行一些额外的步骤才能正确设置 CUDA?
似乎无法使用 CUDA 为 64 位 ARM 构建 Tensorflow 2.3。 Tensorflow 2.3 需要 CUDA 10.2,但 CUDA 工具包直到版本 11 [1] 才在 ARM 上受支持,而 Tensorflow 直到版本 2.4 [1] 才支持 CUDA 11。
我正在尝试在 docker 图像中为 Xavier 编译 tensorflow 2.3 C API。我使用 this 作为基础 docker 映像,它似乎安装了正确版本的 CUDA,但构建失败并显示以下消息:
ERROR: no such package '@local_config_cuda//cuda': Traceback (most recent call last):
#9 51.98 File "/tensorflow/third_party/gpus/cuda_configure.bzl", line 1369
#9 51.98 _create_local_cuda_repository(<1 more arguments>)
#9 51.98 File "/tensorflow/third_party/gpus/cuda_configure.bzl", line 955, in _create_local_cuda_repository
#9 51.98 _get_cuda_config(repository_ctx, <1 more arguments>)
#9 51.98 File "/tensorflow/third_party/gpus/cuda_configure.bzl", line 657, in _get_cuda_config
#9 51.98 find_cuda_config(repository_ctx, <2 more arguments>)
#9 51.98 File "/tensorflow/third_party/gpus/cuda_configure.bzl", line 635, in find_cuda_config
#9 51.98 _exec_find_cuda_config(<3 more arguments>)
#9 51.98 File "/tensorflow/third_party/gpus/cuda_configure.bzl", line 629, in _exec_find_cuda_config
#9 51.98 execute(repository_ctx, <1 more arguments>)
#9 51.98 File "/tensorflow/third_party/remote_config/common.bzl", line 208, in execute
#9 51.98 fail(<1 more arguments>)
#9 51.98 Repository command failed
#9 51.98 Could not find any libcudart.so.10* in any subdirectory:
#9 51.98 ''
#9 51.98 'lib64'
#9 51.98 'lib'
#9 51.98 'lib/*-linux-gnu'
#9 51.98 'lib/x64'
#9 51.98 'extras/CUPTI/*'
#9 51.98 of:
#9 51.98 '/usr/local/cuda-10.2'
以下是我的Dockerfile的相关部分供参考:
FROM nvcr.io/nvidia/l4t-base:r32.5.0
# ... setup bazel etc
# Tensorflow
ENV TF_NEED_CUDA=1 \
GCC_HOST_COMPILER_PATH=/usr/bin/gcc \
TF_CUDA_VERSION=10.2 \
CUDA_TOOLKIT_PATH=/usr/local/cuda-10.2 \
TF_CUDNN_VERSION=8 \
CUDNN_INSTALL_PATH=/usr/local/cuda-10.2 \
TF_CUDA_COMPUTE_CAPABILITIES=7.2,7.5 \
CC_OPT_FLAGS="--copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-msse4.2 --copt=-mfpmath=both --config=cuda" \
PYTHON_BIN_PATH="/usr/bin/python" \
USE_DEFAULT_PYTHON_LIB_PATH=1 \
TF_NEED_JEMALLOC=1 \
TF_NEED_GCP=0 \
TF_NEED_HDFS=0 \
TF_ENABLE_XLA=0 \
TF_NEED_OPENCL=0
RUN cd / && git clone https://github.com/tensorflow/tensorflow
# The bazel build in the next line fails
RUN cd /tensorflow && git checkout r2.3 && bazel build -c opt //tensorflow/tools/lib_package:libtensorflow
我是否遗漏了一些编译选项,或者我是否必须执行一些额外的步骤才能正确设置 CUDA?
似乎无法使用 CUDA 为 64 位 ARM 构建 Tensorflow 2.3。 Tensorflow 2.3 需要 CUDA 10.2,但 CUDA 工具包直到版本 11 [1] 才在 ARM 上受支持,而 Tensorflow 直到版本 2.4 [1] 才支持 CUDA 11。