了解 Tensorflow BasicLSTMCell 内核和偏置形状

Question

我想更好地理解 Tensorflow 的 BasicLSTMCell 内核和偏差的那些形状。

@tf_export("nn.rnn_cell.BasicLSTMCell")
class BasicLSTMCell(LayerRNNCell):

input_depth = inputs_shape[1].value
h_depth = self._num_units
self._kernel = self.add_variable(
    _WEIGHTS_VARIABLE_NAME,
    shape=[input_depth + h_depth, 4 * self._num_units])
self._bias = self.add_variable(
    _BIAS_VARIABLE_NAME,
    shape=[4 * self._num_units],
    initializer=init_ops.zeros_initializer(dtype=self.dtype))

为什么内核的形状是=[input_depth + h_depth, 4 * self._num_units]) 而偏置形状是= [4 * self._num_units] ？也许因子 4 来自遗忘门、块输入、输入门和输出门？ input_depth 和 h_depth 相加的原因是什么？

关于我的 LSTM 网络的更多信息：

num_input = 12，时间步长 = 820，num_hidden = 64，num_classes = 2.

使用 tf.trainables_variables() 我得到以下信息：

变量名：Variable:0 形状：(64, 2) 参数：128
变量名：Variable_1:0 形状：(2,) 参数：2
变量名：rnn/basic_lstm_cell/kernel:0 形状：(76, 256) 参数：19456
变量名：rnn/basic_lstm_cell/bias:0 形状：(256,) 参数：256

以下代码定义了我的 LSTM 网络。

def RNN(x, weights, biases):

    x = tf.unstack(x, timesteps, 1)
    lstm_cell = rnn.BasicLSTMCell(num_hidden)
    outputs, states = rnn.static_rnn(lstm_cell, x, dtype=tf.float32)

    return tf.matmul(outputs[-1], weights['out']) + biases['out']

Answer 1

首先，关于 input_depth 和 h_depth 的求和：RNN 通常遵循 h_t = W*h_t-1 + V*x_t 等方程来计算时间 t 的状态 h。也就是说，我们对最后状态 和当前输入 应用矩阵乘法并将两者相加。这实际上等同于连接 h_t-1 和 x_t（我们就称它为 c），"stacking" 两个矩阵 W 和 V（让我们只需调用此 S) 并计算 S*c.
现在我们只有一个矩阵乘法而不是两个；我相信这可以更有效地并行化，因此出于性能原因这样做。由于 h_t-1 的大小为 h_depth 而 x 的大小为 input_depth 我们需要为连接向量 c.

添加二维

其次，关于来自大门的因素 4，你是对的。这与上面的基本相同：我们不是对输入和每个门执行四次单独的矩阵乘法，而是执行一次乘法，结果是一个大向量，即输入和所有四个门值连接在一起。然后我们可以将这个向量分成四个部分。在 LSTM 单元源代码中，这发生在 lines 627-633.

了解 Tensorflow BasicLSTMCell 内核和偏置形状

Understanding Tensorflow BasicLSTMCell Kernel and Bias shape

machine-learning

deep-learning

lstm

tensorflow

rnn