使用 TensorFlow 时的 GPU 内存管理问题

GPU Memory management issues when using TensorFlow

|进程:GPU 内存 | | GPU PID 类型进程名称用法
| 0 6944 C python3 11585MiB | | 1 6944 C python3 11587MiB | | 2 6944 C python3 10621MiB |

中途停止tensorflow后nvidia-smi内存没有释放

尝试使用这个

config = tf.ConfigProto()
config.gpu_options.allocator_type = 'BFC'
config.gpu_options.per_process_gpu_memory_fraction = 0.90
config.gpu_options.allow_growth = True
sess = tf.Session(config=config)

还有

with tf.device('/gpu:0'):
with tf.Graph().as_default():

已尝试重置 GPU sudo nvidia-smi --gpu-reset -i 0

内存根本无法释放

解决方案是从 获得的,感谢 Yaroslav。

大部分信息是从 Tensorflow Whosebug 文档中获得的。我不允许 post 它。不知道为什么。

将此插入代码的开头。

from tensorflow.python.client import device_lib

# Set the environment variables
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"] = "0"

# Double check that you have the correct devices visible to TF
print("{0}\nThe available CPU/GPU devices on your system\n{0}".format('=' * 100))
print(device_lib.list_local_devices())

Different options to start with GPU or CPU. I am using the CPU. Can be changed from the below options
with tf.device('/cpu:0'):
# with tf.device('/gpu:0'):
# with tf.Graph().as_default():

在会话中使用以下行:

config = tf.ConfigProto(device_count={'GPU': 1}, log_device_placement=False,
                        allow_soft_placement=True)
# allocate only as much GPU memory based on runtime allocations
config.gpu_options.allow_growth = True
sess = tf.Session(config=config)
# Session needs to be closed
sess.close()

以下行将解决被 python

锁定的资源问题
with tf.Session(config=config) as sess:

另一篇有助于理解 重要性的文章 请检查来自 tensorflow 的官方 tf.Session()

参数说明

    To find out which devices your operations and tensors are assigned to, create the session with 
    log_device_placement configuration option set to True.
    
    TensorFlow to automatically choose an existing and supported device to run the operations in case the specified 
    one doesn't exist, you can set allow_soft_placement=True in the configuration option when creating the session.