TensorFlow 中序列到序列模型的分布式（多设备）实现？

Question

Here 是关于在 TensorFlow 中训练 sequence-to-sequence 模型 的非常好的教程。我只是想知道是否有分布式版本利用单台机器上的一组 GPU 以获得更好的性能？

TensorFlow white paper has been mentioned that it is possible to train a large multilayer recurrent neural network (See Figure 8 and "model parallel training" section) as used in Sequence to Sequence Learning with Neural Networks。有人知道当前教程是否涵盖模型并行训练吗？如果不是，如何改进原始教程以利用一组 GPU？

Answer 1

这个例子涵盖了多GPU训练https://www.tensorflow.org/versions/r0.11/tutorials/deep_cnn/index.html#training-a-model-using-multiple-gpu-cards

基本上要使用多个 GPU，您必须使用 tensorflow 的内置范围明确告诉它运行在哪个设备上。你可以有一堆并行的 GPU 运行，然后使用一些设备来总结结果。这涉及很多 tensorflow 范围界定，必须明确完成。因此，要使用 gpu 0，您可以进行类似于此的调用：

        with tf.device("/gpu:0"):
            #do your calculations

然后在代码的其他地方有一个同步步骤：

        with tf.device(aggregation_device):
            #make your updates

link 很好地解释了这一点，但希望这可以帮助您入门

Answer 2

看看这个： https://github.com/shixing/xing_rnn/tree/master/Seq2Seq

它实现了一个基于注意力的 seq2seq 模型，可以将每一层放在不同的 GPU 上。

TensorFlow 中序列到序列模型的分布式（多设备）实现？

Distributed (multi-device) implementation of sequence-to-sequence models in TensorFlow?

gpgpu

multi-gpu

lstm

tensorflow

recurrent-neural-network