LSTM 单元电路中实际上 num_unit 是什么？

Question

我很努力地到处搜索，但我找不到 TensorFlow 中的 num_units 到底是什么。我试图将我的问题与联系起来，但在那里我无法得到明确的解释。

在TensorFlow中，创建基于LSTM的RNN时，我们使用以下命令

cell = rnn.BasicLSTMCell(num_units=5, state_is_tuple=True)

如Colah's blog所说，这是一个基本的 LSTM 单元：

现在，假设我的数据是：

idx2char = ['h', 'i', 'e', 'l', 'o']

# Teach hello: hihell -> ihello
x_data = [[0, 1, 0, 2, 3, 3]]   # hihell
x_one_hot = [[[1, 0, 0, 0, 0],   # h 0
              [0, 1, 0, 0, 0],   # i 1
              [1, 0, 0, 0, 0],   # h 0
              [0, 0, 1, 0, 0],   # e 2
              [0, 0, 0, 1, 0],   # l 3
              [0, 0, 0, 1, 0]]]  # l 3

y_data = [[1, 0, 2, 3, 3, 4]]    # ihello

我的输入是：

x_one_hot = [[[1, 0, 0, 0, 0],   # h 0
              [0, 1, 0, 0, 0],   # i 1
              [1, 0, 0, 0, 0],   # h 0
              [0, 0, 1, 0, 0],   # e 2
              [0, 0, 0, 1, 0],   # l 3
              [0, 0, 0, 1, 0]]]  # l 3

形状为[6,5]。

在this blog中，我们有如下图片

据我所知，BasicLSTMCell 将展开 t 个时间步长，其中 t 是我的行数（如果我错了，请纠正我！） .例如，在下图中，LSTM 展开 t = 28 个时间步。

在Colah的博客里，写着

each line carries an entire vector

那么，让我们看看我的 [6,5] 输入矩阵将如何通过这个基于 LSTM 的 RNN。

如果我上面的图是正确的，那么num_units（我们在LSTM单元中定义的）到底是什么？它是 LSTM 单元的参数吗？

如果 num_unit 是单个 LSTM 单元的参数，那么它应该是这样的：

如果上图是正确的，那么下面的 LSTM 单元示意图（根据 Colah 的博客）中的 5 num_units 在哪里？

如果您能用数字给出答案，那将非常有帮助！您可以编辑或创建新的白板图 here.

Answer 1

你的理解很正确。然而，不幸的是，Tensorflow 术语与文献之间存在不一致。为了理解，您需要深入了解 Tensorflow 实现代码。

Tensorflow 宇宙中的 cell 在 Colah 的宇宙中称为 LSTM 层（即展开版本）。这就是为什么您总是定义一个单元，而不是 Tensorflow 架构中的一个层。例如，

cell=rnn.BasicLSTMCell(num_units=5,state_is_tuple=True)

在此处检查代码。

https://github.com/tensorflow/tensorflow/blob/ef96faaf02be54b7eb5945244c881126a4d38761/tensorflow/python/ops/rnn_cell.py#L90

The definition of cell in this package differs from the definition used in the literature. In the literature, cell refers to an object with a single scalar output. The definition in this package refers to a horizontal array of such units.

因此，为了理解 Tensorflow 中的 num_units，最好想象一个展开的 LSTM，如下所示。

在展开的版本中，您有一个输入 X_t，它是一个张量。当您指定形状的输入时

[batch_size,time_steps,n_input]

对于 Tensorflow，它知道从您的 time_steps 参数展开多少次。

因此，如果您在 TensorFlow 中将 X_t 作为一维数组，那么在 Colahs 展开版本中每个 LSTM 单元 x_t 成为标量值（请注意大写 X (vector/array) 和小写 x(scalar) - 也在 Colah 的数字中）

如果您在 Tensorflow 中将 X_t 作为二维数组，那么在 Colahs 展开版本中每个 LSTM 单元 x_t 变为一维 array/vector （如您的情况）等等。

现在最重要的问题来了。

Tensorflow 怎么知道什么是 output/hidden 维度 ** Z_t/H_t ？

（请注意 H_t 和 Z_t 之间的区别 - 我通常更喜欢将它们分开，因为 H_t 返回输入（循环）和 Z_t 是输出 - 图中未显示)

是否与X_t具有相同的维度？

没有。它可以是任何不同的形状。您需要将其指定给 Tensorflow。那就是 num_units - 输出大小

检查这里的代码：

https://github.com/tensorflow/tensorflow/blob/ef96faaf02be54b7eb5945244c881126a4d38761/tensorflow/python/ops/rnn_cell.py#L298-L300

    @property
    def output_size(self):
        return self._num_units

Tensorflow 使用以下论文中 Colahs universe 中定义的 LSTM 单元的实现：

https://arxiv.org/pdf/1409.2329.pdf

LSTM 单元电路中实际上 num_unit 是什么？

What is actually num_unit in LSTM cell circuit?

python

deep-learning

lstm

tensorflow

rnn