使用 theano 进行 CUDA 运行时 gpu 初始化

Question

我正在尝试在 https://github.com/uoguelph-mlrg/theano_multi_gpu 之后跨两个 GPU 并行化我的神经网络。我拥有所有依赖项，但 cuda 运行时初始化失败并显示以下消息。

ERROR (theano.sandbox.cuda): ERROR: Not using GPU. Initialisation of device 0 failed:
cublasCreate() returned this error 'the CUDA Runtime initialization failed'
Error when trying to find the memory information on the GPU: invalid device ordinal
Error allocating 24 bytes of device memory (invalid device ordinal). Driver report 0 bytes free and 0 bytes total
ERROR (theano.sandbox.cuda): ERROR: Not using GPU. Initialisation of device gpu failed:
CudaNdarray_ZEROS: allocation failed.
Process Process-1:
Traceback (most recent call last):
  File "/opt/share/Python-2.7.9/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/opt/share/Python-2.7.9/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/u/bsankara/nt/Git-nt/nt/train_attention.py", line 171, in launch_train
    clip_c=1.)
  File "/u/bsankara/nt/Git-nt/nt/nt.py", line 1616, in train
    import theano.sandbox.cuda
  File "/opt/share/Python-2.7.9/lib/python2.7/site-packages/theano/__init__.py", line 98, in <module>
    theano.sandbox.cuda.tests.test_driver.test_nvidia_driver1()
  File "/opt/share/Python-2.7.9/lib/python2.7/site-packages/theano/sandbox/cuda/tests/test_driver.py", line 30, in test_nvidia_driver1
    A = cuda.shared_constructor(a)
  File "/opt/share/Python-2.7.9/lib/python2.7/site-packages/theano/sandbox/cuda/var.py", line 181, in float32_shared_constructor
    enable_cuda=False)
  File "/opt/share/Python-2.7.9/lib/python2.7/site-packages/theano/sandbox/cuda/__init__.py", line 389, in use
    cuda_ndarray.cuda_ndarray.CudaNdarray.zeros((2, 3))
RuntimeError: ('CudaNdarray_ZEROS: allocation failed.', 'You asked to force this device and it failed. No fallback to the cpu or other gpu device.')

代码片段的相关部分在这里：

from multiprocessing import Queue
import zmq
import pycuda.driver as drv
import pycuda.gpuarray as gpuarray

def train(private_args, process_env, <some other args>)
    if process_env is not None:
       os.environ = process_env

    ####
    # pycuda and zmq environment

    drv.init()
    dev = drv.Device(private_args['ind_gpu'])
    ctx = dev.make_context()
    sock = zmq.Context().socket(zmq.PAIR)

    if private_args['flag_client']:
        sock.connect('tcp://localhost:5000')
    else:
        sock.bind('tcp://*:5000')

    ####
    # import theano stuffs
    import theano.sandbox.cuda
    theano.sandbox.cuda.use(private_args['gpu'])

    import theano
    import theano.tensor as tensor
    from theano.sandbox.rng_mrg import MRG_RandomStreams as RandomStreams
    import theano.misc.pycuda_init
    import theano.misc.pycuda_utils
...

导入时触发错误theano.sandbox.cuda。这就是我将训练功能作为两个进程启动的地方。

def launch_train(curr_args, process_env, curr_queue, oth_queue):
    trainerr, validerr, testerr = train(private_args=curr_args,
                                        process_env=process_env,
                                         ...)

process1_env = os.environ.copy()
process1_env['THEANO_FLAGS'] = "cuda.root=/opt/share/cuda-7.0,device=gpu0,floatX=float32,on_unused_input=ignore,optimizer=fast_run,exception_verbosity=high,compiledir=/u/bsankara/.theano/NT_multi_GPU1"
process2_env = os.environ.copy()
process2_env['THEANO_FLAGS'] = "cuda.root=/opt/share/cuda-7.0,device=gpu1,floatX=float32,on_unused_input=ignore,optimizer=fast_run,exception_verbosity=high,compiledir=/u/bsankara/.theano/NT_multi_GPU2"

p = Process(target=launch_train,
                args=(p_args, process1_env, queue_p, queue_q))
q = Process(target=launch_train,
                args=(q_args, process2_env, queue_q, queue_p))

p.start()
q.start()
p.join()
q.join()

然而，如果我尝试在 Python 中以交互方式初始化 gpu，则 import 语句似乎有效。我执行了 train() 的前 20 行，它在那里运行良好，并且还按照我的要求正确地将我分配给了 gpu0。

Answer 1

在深入研究运行 pdb 后，原发布者发现了问题。

基本上是theano和pycuda都在争着初始化gpu，导致了这个问题。解决方法是先'import theano'，这样会得到一个gpu，然后attach到pycuda中具体的context。因此，train 函数中的导入部分将如下所示：

def train(private_args, process_env, <some other args>)
    if process_env is not None:
       os.environ = process_env

    ####
    # import theano related
    # We need global imports and so we make them as such
    theano = __import__('theano')
    _t_tensor = __import__('theano', globals(), locals(), ['tensor'], -1)
    tensor = _t_tensor.tensor

    import theano.sandbox.cuda
    import theano.misc.pycuda_utils

    ####
    # pycuda and zmq environment
    import zmq
    import pycuda.driver as drv
    import pycuda.gpuarray as gpuarray

    drv.init()
    # Attach the existing context (already initialized by theano import statement)
    ctx = drv.Context.attach()
    sock = zmq.Context().socket(zmq.PAIR)

    if private_args['flag_client']:
        sock.connect('tcp://localhost:5000')
    else:
        sock.bind('tcp://*:5000')

[此答案是作为社区维基条目添加的，来自 OP 所做的编辑，试图将此问题从未回答的列表中删除]。

使用 theano 进行 CUDA 运行时 gpu 初始化

CUDA runtime gpu initialization with theano

python

theano

pycuda