第二次计算的 Tensorflow 性能下降

Tensorflow performance drop for second calculation

我是 Tensorflow2.0 的新手,我正在考虑使用它的 gpu 处理功能进行一些矩阵计算。所以我在测量性能时尝试了一些大矩阵乘法。 当我 运行 它在一个大矩阵上时它非常快。但是当我 运行 它之后在其他矩阵上它变得非常慢。非常小的张量的初始化也很慢。 这是一个问题,因为矩阵使用了太多内存吗?但即使我用 pythons del 删除变量,问题仍然存在。


import tensorflow as tf
import numpy as np
import time

a = np.ones((9000,4000))
b = np.ones((4000,9000))

a2 = [a,a,a,a,a,a,a]
b2 = [b,b,b,b,b,b,b]

a3 = np.ones((7,9000,4000))
b3 = np.ones((7,4000,9000))

with tf.device('/gpu:0'):
    # first multiplication

    a2 = tf.convert_to_tensor(a)
    b2 = tf.convert_to_tensor(b)

    start = time.time()
    c = tf.matmul([b2,b2,b2,b2,b2,b2,b2], [a2,a2,a2,a2,a2,a2,a2])
    print("first multiplication time: ", time.time() - start)
    del c, a2, b2

    # second multiplication

    a3 = tf.convert_to_tensor(a3)
    b3 = tf.convert_to_tensor(b3)

    start = time.time()
    c = tf.matmul(b3, a3)
    print("second multiplication time: ", time.time() - start)
    del c, a3, b3

    # third multiplication

    start = time.time()
    n = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='n')
    m = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='m')
    print("constant init time: ",time.time() - start)

    c = tf.matmul([n,n], [m,m])
    print("constant init plus third multiplication time: ", time.time() - start)


first multiplication time:  0.7032458782196045
2021-02-07 20:40:36.004254: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 2016000000 exceeds 10% of free system memory.
2021-02-07 20:40:36.588404: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 2016000000 exceeds 10% of free system memory.
second multiplication time:  6.460264682769775
constant init time:  6.7629804611206055
constant init plus third multiplication time:  6.76327919960022


2021-02-07 20:44:29.165061: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 2016000000 exceeds 10% of free system memory.
2021-02-07 20:44:29.763323: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 2016000000 exceeds 10% of free system memory.
second multiplication time:  0.9040727615356445
constant init time:  7.273072242736816
constant init plus third multiplication time:  7.273530006408691


constant init time:  0.0499725341796875
constant init plus third multiplication time:  0.4284539222717285



发生这种情况是因为您没有将张量从 GPU 传输回 CPU,所以它们占用了 GPU space。我不确定 del,从技术上讲它应该在 eager 中工作,但是有一个与内存泄漏相关的错误(不确定它是否已修复)。


c = tf.matmul(b3, a3).numpy() // call numpy which copies it back to cpu


first multiplication time:  8.76913070678711
second multiplication time:  8.516901731491089
constant init time:  0.0011458396911621094
constant init plus third multiplication time:  0.0024268627166748047
