第二次计算的 Tensorflow 性能下降
Tensorflow performance drop for second calculation
我是 Tensorflow2.0 的新手,我正在考虑使用它的 gpu 处理功能进行一些矩阵计算。所以我在测量性能时尝试了一些大矩阵乘法。
当我 运行 它在一个大矩阵上时它非常快。但是当我 运行 它之后在其他矩阵上它变得非常慢。非常小的张量的初始化也很慢。
这是一个问题,因为矩阵使用了太多内存吗?但即使我用 pythons del
删除变量,问题仍然存在。
我的python代码:
import tensorflow as tf
import numpy as np
import time
a = np.ones((9000,4000))
b = np.ones((4000,9000))
a2 = [a,a,a,a,a,a,a]
b2 = [b,b,b,b,b,b,b]
a3 = np.ones((7,9000,4000))
b3 = np.ones((7,4000,9000))
with tf.device('/gpu:0'):
# first multiplication
a2 = tf.convert_to_tensor(a)
b2 = tf.convert_to_tensor(b)
start = time.time()
c = tf.matmul([b2,b2,b2,b2,b2,b2,b2], [a2,a2,a2,a2,a2,a2,a2])
print("first multiplication time: ", time.time() - start)
del c, a2, b2
# second multiplication
a3 = tf.convert_to_tensor(a3)
b3 = tf.convert_to_tensor(b3)
start = time.time()
c = tf.matmul(b3, a3)
print("second multiplication time: ", time.time() - start)
del c, a3, b3
# third multiplication
start = time.time()
n = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='n')
m = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='m')
print("constant init time: ",time.time() - start)
c = tf.matmul([n,n], [m,m])
print("constant init plus third multiplication time: ", time.time() - start)
输出(无tensorflow信息输出)
first multiplication time: 0.7032458782196045
2021-02-07 20:40:36.004254: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 2016000000 exceeds 10% of free system memory.
2021-02-07 20:40:36.588404: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 2016000000 exceeds 10% of free system memory.
second multiplication time: 6.460264682769775
constant init time: 6.7629804611206055
constant init plus third multiplication time: 6.76327919960022
当我取消注释第一个乘法时,输出变为:
2021-02-07 20:44:29.165061: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 2016000000 exceeds 10% of free system memory.
2021-02-07 20:44:29.763323: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 2016000000 exceeds 10% of free system memory.
second multiplication time: 0.9040727615356445
constant init time: 7.273072242736816
constant init plus third multiplication time: 7.273530006408691
而当我只运行第三次计算时:
constant init time: 0.0499725341796875
constant init plus third multiplication time: 0.4284539222717285
我真的很想了解正在发生的事情,甚至可能找到改进它的方法。
感谢您的帮助!
发生这种情况是因为您没有将张量从 GPU 传输回 CPU,所以它们占用了 GPU space。我不确定 del,从技术上讲它应该在 eager 中工作,但是有一个与内存泄漏相关的错误(不确定它是否已修复)。
如果在tf.matmul
之后调用额外的函数
c = tf.matmul(b3, a3).numpy() // call numpy which copies it back to cpu
你应该得到正确的时间,
first multiplication time: 8.76913070678711
second multiplication time: 8.516901731491089
constant init time: 0.0011458396911621094
constant init plus third multiplication time: 0.0024268627166748047
让我知道是否缺少任何东西...
我是 Tensorflow2.0 的新手,我正在考虑使用它的 gpu 处理功能进行一些矩阵计算。所以我在测量性能时尝试了一些大矩阵乘法。
当我 运行 它在一个大矩阵上时它非常快。但是当我 运行 它之后在其他矩阵上它变得非常慢。非常小的张量的初始化也很慢。
这是一个问题,因为矩阵使用了太多内存吗?但即使我用 pythons del
删除变量,问题仍然存在。
我的python代码:
import tensorflow as tf
import numpy as np
import time
a = np.ones((9000,4000))
b = np.ones((4000,9000))
a2 = [a,a,a,a,a,a,a]
b2 = [b,b,b,b,b,b,b]
a3 = np.ones((7,9000,4000))
b3 = np.ones((7,4000,9000))
with tf.device('/gpu:0'):
# first multiplication
a2 = tf.convert_to_tensor(a)
b2 = tf.convert_to_tensor(b)
start = time.time()
c = tf.matmul([b2,b2,b2,b2,b2,b2,b2], [a2,a2,a2,a2,a2,a2,a2])
print("first multiplication time: ", time.time() - start)
del c, a2, b2
# second multiplication
a3 = tf.convert_to_tensor(a3)
b3 = tf.convert_to_tensor(b3)
start = time.time()
c = tf.matmul(b3, a3)
print("second multiplication time: ", time.time() - start)
del c, a3, b3
# third multiplication
start = time.time()
n = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='n')
m = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='m')
print("constant init time: ",time.time() - start)
c = tf.matmul([n,n], [m,m])
print("constant init plus third multiplication time: ", time.time() - start)
输出(无tensorflow信息输出)
first multiplication time: 0.7032458782196045
2021-02-07 20:40:36.004254: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 2016000000 exceeds 10% of free system memory.
2021-02-07 20:40:36.588404: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 2016000000 exceeds 10% of free system memory.
second multiplication time: 6.460264682769775
constant init time: 6.7629804611206055
constant init plus third multiplication time: 6.76327919960022
当我取消注释第一个乘法时,输出变为:
2021-02-07 20:44:29.165061: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 2016000000 exceeds 10% of free system memory.
2021-02-07 20:44:29.763323: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 2016000000 exceeds 10% of free system memory.
second multiplication time: 0.9040727615356445
constant init time: 7.273072242736816
constant init plus third multiplication time: 7.273530006408691
而当我只运行第三次计算时:
constant init time: 0.0499725341796875
constant init plus third multiplication time: 0.4284539222717285
我真的很想了解正在发生的事情,甚至可能找到改进它的方法。
感谢您的帮助!
发生这种情况是因为您没有将张量从 GPU 传输回 CPU,所以它们占用了 GPU space。我不确定 del,从技术上讲它应该在 eager 中工作,但是有一个与内存泄漏相关的错误(不确定它是否已修复)。
如果在tf.matmul
之后调用额外的函数c = tf.matmul(b3, a3).numpy() // call numpy which copies it back to cpu
你应该得到正确的时间,
first multiplication time: 8.76913070678711
second multiplication time: 8.516901731491089
constant init time: 0.0011458396911621094
constant init plus third multiplication time: 0.0024268627166748047
让我知道是否缺少任何东西...