无法打开张量流会话

Cannot open a tensorflow session

当我尝试打开 tensorflow 会话时,出现以下错误:

2017-09-24 10:49:20.526121: I tensorflow/core/common_runtime/gpu/gpu_device.cc:955] Found device 0 with properties: 
name: GeForce GTX 970
major: 5 minor: 2 memoryClockRate (GHz) 1.342
pciBusID 0000:03:00.0
Total memory: 3.94GiB
Free memory: 3.87GiB
2017-09-24 10:49:20.599629: W tensorflow/stream_executor/cuda/cuda_driver.cc:523] A non-primary context 0x3dcf7e0 exists before initializing the StreamExecutor. We haven't verified StreamExecutor works with that.
2017-09-24 10:49:20.599947: E tensorflow/core/common_runtime/direct_session.cc:171] Internal: failed initializing StreamExecutor for CUDA device ordinal 1: Internal: failed call to cuDevicePrimaryCtxRetain: CUDA_ERROR_INVALID_DEVICE
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/user/python-envs/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1486, in __init__
    super(Session, self).__init__(target, graph, config=config)
  File "/home/user/python-envs/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 621, in __init__
    self._session = tf_session.TF_NewDeprecatedSession(opts, status)
  File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
    self.gen.next()
  File "/home/user/python-envs/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status
    pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InternalError: Failed to create session.

我的系统中有两个 GPU。一个用于显示,另一个用于计算:

GPU0 (display) : Nvidia NVS 310 
GPU1 (compute) : Nvidia Geforce GTX 970
Graphics Driver: 384.66
CUDA version   : 8
cuDNN version  : v6 for CUDA 8 (April 27, 2017)
Operating Sys. : Ubuntu 16.04

还有其他人遇到过这个问题吗?我该如何继续 debugging/fixing 这个?

注意:我确实尝试在 Github 上打开一个问题。但是在我完成之前,我被要求寻找之前在 SO 上提出的问题或在那里询问。

谢谢!

tensorflow 似乎试图获取所有可用的 GPU 进行计算,如下面链接的 Github 问题所示。将环境变量 CUDA_VISIBLE_DEVICES 设置为我想用于计算的设备就可以了。

Github 上可能相关的问题包括:Segmentation fault when GPUs are already used

可以通过 运行 nvidia-smi 实用程序检查 Ubuntu 上的设备 ID。