使用 TensorFlow 时的 GPU 内存管理问题

Question

中途停止tensorflow后nvidia-smi内存没有释放

尝试使用这个

config = tf.ConfigProto()
config.gpu_options.allocator_type = 'BFC'
config.gpu_options.per_process_gpu_memory_fraction = 0.90
config.gpu_options.allow_growth = True
sess = tf.Session(config=config)

还有

with tf.device('/gpu:0'):
with tf.Graph().as_default():

已尝试重置 GPU sudo nvidia-smi --gpu-reset -i 0

内存根本无法释放

Answer 1

解决方案是从获得的，感谢 Yaroslav。

大部分信息是从 Tensorflow Whosebug 文档中获得的。我不允许 post 它。不知道为什么。

将此插入代码的开头。

from tensorflow.python.client import device_lib

# Set the environment variables
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"] = "0"

# Double check that you have the correct devices visible to TF
print("{0}\nThe available CPU/GPU devices on your system\n{0}".format('=' * 100))
print(device_lib.list_local_devices())

Different options to start with GPU or CPU. I am using the CPU. Can be changed from the below options
with tf.device('/cpu:0'):
# with tf.device('/gpu:0'):
# with tf.Graph().as_default():

在会话中使用以下行：

config = tf.ConfigProto(device_count={'GPU': 1}, log_device_placement=False,
                        allow_soft_placement=True)
# allocate only as much GPU memory based on runtime allocations
config.gpu_options.allow_growth = True
sess = tf.Session(config=config)
# Session needs to be closed
sess.close()

以下行将解决被 python

锁定的资源问题

with tf.Session(config=config) as sess:

另一篇有助于理解重要性的文章请检查来自 tensorflow 的官方 tf.Session()。

参数说明

    To find out which devices your operations and tensors are assigned to, create the session with 
    log_device_placement configuration option set to True.
    
    TensorFlow to automatically choose an existing and supported device to run the operations in case the specified 
    one doesn't exist, you can set allow_soft_placement=True in the configuration option when creating the session.

使用 TensorFlow 时的 GPU 内存管理问题

GPU Memory management issues when using TensorFlow

keras

tensorflow

tensorflow-gpu

以下行将解决被 python

参数说明