无法使用来自 Docker 的 GPU。在 tensorflow GPU docker 图像之上构建自定义 docker 容器图像

Unable to use GPU from Docker. Building custom docker container image on top of tensorflow GPU docker image

我正在尝试构建自定义 docker 图像来为我们的图像分类模型提供服务。

在 Google 云上使用 Ubuntu 18.04。显卡型号 Nvidia-t4。在同一台机器上,使用 Tensorflow - GPU 1.9.0 并按预期工作。当我使用以下命令构建 docker 文件时:

sudo nvidia-docker build -t name .

看到以下错误信息。模型在 CPU 而不是 GPU 上加载,并在 CPU 上的 运行 中进行推理。

2021-01-05 20:46:59.617414: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2021-01-05 20:46:59.618426: E tensorflow/stream_executor/cuda/cuda_driver.cc:397] failed call to cuInit: CUresult(-1)
2021-01-05 20:46:59.618499: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:152] no NVIDIA GPU device is present: /dev/nvidia0 does not exist

Docker 文件:

FROM tensorflow/tensorflow:1.9.0-gpu-py3 as base
ENV CUDA_HOME /usr/local/cuda
ENV PATH=/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
ENV LD_LIBRARY_PATH /usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda/lib64:$LD_LIBRARY_PATH
RUN ln -s /usr/local/cuda/lib64/stubs/libcuda.so /usr/local/cuda/lib64/stubs/libcuda.so.1 \
     && echo "/usr/local/cuda/lib64/stubs" > /etc/ld.so.conf.d/z-cuda-stubs.conf \
     && ldconfig
ENV NVIDIA_VISIBLE_DEVICES all
ADD . /app
WORKDIR /app
RUN apt-get -yqq update
RUN apt-get install -yqq libsm6 libxext6 libxrender-dev
RUN pip install -r requirements.txt
RUN python3 run_model.py

我是否需要在我的 docker 文件中添加更多内容?

没什么好担心的。直接烧系统