Tensorflow

Question

我有一个 tf.nn.rnn_cell.BasicLSTMCell 作为我的神经网络架构的一部分。我使用 for 循环，因为它在固定数量的时间步长上递归输入。像这样：

    lstm_cell = tf.nn.rnn_cell.BasicLSTMCell(num_units=lstm_dimensionality, name="forward_lstm")
    _, (lstm_memory, lstm_hidden) = lstm_cell(input_m, state=[lstm_memory, lstm_hidden])

    for i in range(3):
        # HERE is where the error is thrown
        _, (lstm_memory, lstm_hidden) = lstm_cell(input_m, state=[lstm_memory, lstm_hidden])

它在单个设备上本地运行良好。它在单个 GPU 上的 Google ML Engine 中也能正常工作。但是，当我尝试使用 tf.distribute.MirroredStrategy 分发到 4 个 GPU 时，它会抛出异常

ValueError: At least one of name (None) and default_name (None) must be provided.

lstm_cell 可调用函数甚至没有采用 name 参数，因此令人困惑。

这里没有太多的空间来介绍细节，所以我在这个 Github repo to reproduce the bug in ML Engine. It is specifically on this line 中创建了一个玩具示例，其中会引发错误。

张量流：1.13.1 机器学习引擎：--runtime-version 1.13

Answer 1

在您的代码 here 中，您在函数 compute_initial_lstm_state 中使用了范围。

您重复使用 2 个返回值 here

您使用作用域来生成值并且在没有作用域的情况下分配它们。

这应该是您的根本错误。使用单个 GPU，可以自动推断范围。但是对于多GPU，这是不可能的，而且会失败。

Tensorflow - 无法在 Estimator 中使用带有 MirroredStrategy 分布的 BasicLSTMCell

Tensorflow - Unable to use a BasicLSTMCell with a MirroredStrategy distribution in Estimator

python

google-cloud-platform

google-cloud-ml