Tensorflow 崩溃 CUBLAS_STATUS_ALLOC_FAILED

Tensorflow crashes with CUBLAS_STATUS_ALLOC_FAILED

我在 Windows 10 上 运行ning tensorflow-gpu 使用一个简单的 MINST 神经网络程序。当它尝试 运行 时,遇到 CUBLAS_STATUS_ALLOC_FAILED 错误。 google 搜索没有找到任何结果。

I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:885] Found device 0 with properties:
name: GeForce GTX 970
major: 5 minor: 2 memoryClockRate (GHz) 1.253
pciBusID 0000:0f:00.0
Total memory: 4.00GiB
Free memory: 3.31GiB
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:906] DMA: 0
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:916] 0:   Y
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 970, pci bus id: 0000:0f:00.0)
E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\cuda\cuda_blas.cc:372] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\stream.cc:1390] attempting to perform BLAS operation using StreamExecutor without BLAS support
Traceback (most recent call last):
  File "C:\Users\Anonymous\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\client\session.py", line 1021, in _do_call
    return fn(*args)
  File "C:\Users\Anonymous\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\client\session.py", line 1003, in _run_fn
    status, run_metadata)
  File "C:\Users\Anonymous\AppData\Local\Programs\Python\Python35\lib\contextlib.py", line 66, in __exit__
    next(self.gen)
  File "C:\Users\Anonymous\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 469, in raise_exception_on_not_ok_status
    pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InternalError: Blas SGEMM launch failed : a.shape=(100, 784), b.shape=(784, 256), m=100, n=256, k=784
         [[Node: MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/gpu:0"](_recv_Placeholder_0/_7, Variable/read)]]
         [[Node: Mean/_15 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_35_Mean", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]

在 windows 上,目前 tensorflow 不会像文档中所说的那样分配所有可用内存,相反,您可以通过允许动态内存增长来解决此错误,如下所示:

tf.Session(config=tf.ConfigProto(allow_growth=True))

会话配置的“allow_growth”属性 的位置现在似乎有所不同。在这里解释:https://www.tensorflow.org/tutorials/using_gpu

所以目前你必须这样设置:

import tensorflow as tf
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.Session(config=config, ...)

我发现这个解决方案有效

import tensorflow as tf
from keras.backend.tensorflow_backend import set_session

config = tf.ConfigProto(
    gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.8)
    # device_count = {'GPU': 1}
)
config.gpu_options.allow_growth = True
session = tf.Session(config=config)
set_session(session)

Tensorflow 2.0 阿尔法

允许 GPU 内存增长可能会解决此问题。对于 Tensorflow 2.0 alpha / nightly,您可以尝试两种方法来存档它。

1.)

import tensorflow as tf
tf.config.gpu.set_per_process_memory_growth()

2.)

import tensorflow as tf
tf.config.gpu.set_per_process_memory_fraction(0.4) # adjust this to the % of VRAM you 
                                                   # want to give to tensorflow.

我建议您都尝试一下,看看是否有帮助。 资料来源:https://www.tensorflow.org/alpha/guide/using_gpu

None 这些修复对我有用,因为 tensorflow 库的结构似乎已经改变。对于 Tensorflow 2.0,唯一对我有用的修复方法是此页面 Limiting GPU memory growth 下的 https://www.tensorflow.org/guide/gpu

为了完整性和面向未来,这里是文档中的解决方案 - 我想更改 memory_limit 对某些人来说可能是必要的 - 1 GB 适合我的情况。

gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
  # Restrict TensorFlow to only allocate 1GB of memory on the first GPU
  try:
    tf.config.experimental.set_virtual_device_configuration(
        gpus[0],
        [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=1024)])
    logical_gpus = tf.config.experimental.list_logical_devices('GPU')
    print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
  except RuntimeError as e:
    # Virtual devices must be set before GPUs have been initialized
    print(e)

努力:

from keras.backend.tensorflow_backend import set_session
import tensorflow as tf

config = tf.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.Session(config=config)
set_session(session)

tensorflow>=2.0

import tensorflow as tf
config = tf.compat.v1.ConfigProto(gpu_options = 
                         tf.compat.v1.GPUOptions(per_process_gpu_memory_fraction=0.8)
# device_count = {'GPU': 1}
)
config.gpu_options.allow_growth = True
session = tf.compat.v1.Session(config=config)
tf.compat.v1.keras.backend.set_session(session)

就我而言,陈旧的 python 进程正在消耗内存。我通过任务管理器把它杀了,一切都恢复正常了。

对于 TensorFlow 2.2 none 在遇到 CUBLAS_STATUS_ALLOC_FAILED 问题时其他答案有效。在 https://www.tensorflow.org/guide/gpu 上找到了解决方案:

import tensorflow as tf
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
    try:
        # Currently, memory growth needs to be the same across GPUs
        for gpu in gpus:
            tf.config.experimental.set_memory_growth(gpu, True)
        logical_gpus = tf.config.experimental.list_logical_devices('GPU')
        print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
    except RuntimeError as e:
        # Memory growth must be set before GPUs have been initialized
        print(e)

我 运行 在进行任何进一步计算之前,我发现这段代码之前产生 CUBLAS 错误的相同代码现在可以在同一会话中工作。上面的示例代码是一个具体的例子,它设置了跨多个物理 GPU 的内存增长,但它也解决了内存扩展问题。

晚会有点晚了,但这解决了我与 tensorflow 2.4.0 和 gtx 980ti 的问题。 在限制内存之前我得到了一个错误,如:

CUBLAS_STATUS_ALLOC_FAILED

我的解决方案是这段代码:

import tensorflow as tf
gpus = tf.config.list_physical_devices('GPU')
tf.config.experimental.set_virtual_device_configuration(
        gpus[0],
        [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=4096)])

我在这里找到了解决方案:https://www.tensorflow.org/guide/gpu