无论批量大小如何,Tensorflow 使用相同数量的 gpu 内存

Tensorflow uses same amount of gpu memory regardless of batch size

我是 tensorflow 的新手,我正在尝试在 CIFAR 10 数据集上进行训练。我注意到,无论我根据我的 nvidia 控制面板使用什么批量大小,我都使用了 97% 的 gpu 内存。我尝试了从 100 到 2 的批量大小,在每种情况下我的 gpu 内存使用率总是 97%。为什么要这样做?

def batchGenerator(batch_size = 32):
    bi = 0
    random.shuffle(train_data)
    while bi + batch_size < len(train_data):
        x = np.zeros((batch_size, 32, 32, 3))
        y = np.zeros((batch_size, 10))
        for b in range(batch_size):
            x[b] = train_data[bi + b][0]
            if random.choice((True, False)):
                img = cv2.flip(x[b], 0)
            if random.choice((True, False)):
                img = cv2.flip(x[b], 1)
            y[b][train_data[bi + b][1]] = 1
        bi += batch_size
        yield(x, y)

with tf.Session() as s:
    s.run(tf.initialize_all_variables())
    for epoch in range(100):
        a = 0.0
        i = 0
        for x_train, y_train in batchGenerator(2):
            outs = s.run([opt, accuracy], feed_dict = {x: x_train, y_exp: y_train, p_keep: 0.5})
            a += outs[-1]
            i += 1
        print('Epoch', epoch, 'Accuracy', a / i)

此问题与

有关

TensorFlow 默认使用 GPU 的所有内存,这是正常行为。来自教程 Using GPUs:

By default, TensorFlow maps nearly all of the GPU memory of all GPUs visible to the process. This is done to more efficiently use the relatively precious GPU memory resources on the devices by reducing memory fragmentation.


如果您需要减少 TensorFlow 占用的内存,它们还提供不同的选项,例如:

config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.4
session = tf.Session(config=config, ...)

(来自链接文档),将内存使用限制为 GPU 可用内存的 40%。