GPU 上的 Tensorflow matmul 计算比 CPU 慢

Question

我是第一次尝试使用 GPU 计算，当然希望能有很大的加速。然而，对于 tensorflow 中的一个基本示例，它实际上更糟：

在 cpu:0 上，十个运行中的每一个平均花费 2 秒，gpu:0 花费 2.7 秒并且 gpu:1 比 [=20] 差 50% =] 3 秒。

代码如下：

import tensorflow as tf
import numpy as np
import time
import random

for _ in range(10):
    with tf.Session() as sess:
        start = time.time()
        with tf.device('/gpu:0'): # swap for 'cpu:0' or whatever
            a = tf.constant([random.random() for _ in xrange(1000 *1000)], shape=[1000, 1000], name='a')
            b = tf.constant([random.random() for _ in xrange(1000 *1000)], shape=[1000, 1000], name='b')
            c = tf.matmul(a, b)
            d = tf.matmul(a, c)
            e = tf.matmul(a, d)
            f = tf.matmul(a, e)
            for _ in range(1000):
                sess.run(f)
        end = time.time()
        print(end - start)

我在这里观察到了什么？运行时间可能主要由在 RAM 和 GPU 之间复制数据所支配？

Answer 1

您用于生成数据的方式在 CPU 上执行（random.random() 是常规 python 函数，而不是 TF-one）。此外，执行它 10^6 次将比在一次运行中请求 10^6 个随机数慢。将代码更改为：

a = tf.random_uniform([1000, 1000], name='a')
b = tf.random_uniform([1000, 1000], name='b')

这样数据将在 GPU 上并行生成，不会浪费时间将其从 RAM 传输到 GPU。

GPU 上的 Tensorflow matmul 计算比 CPU 慢

Tensorflow matmul calculations on GPU are slower than on CPU

python

performance

gpu

tensorflow