了解 Tensorflow LSTM 输入形状

Question

我有一个数据集 X，它包含 N = 4000 个样本，每个样本包含 d = 2 个特征（连续值）跨越返回 t = 10 个时间步。在第 11 步，我也有每个样本的相应 'labels'，它们也是连续值。

目前我的数据集的形状为 X：[4000,20]，Y：[4000]。

我想使用 TensorFlow 训练 LSTM 来预测 Y 的值（回归），给定 d 个特征的 10 个先前输入，但我在 TensorFlow 中实现它时遇到了困难。

我目前遇到的主要问题是了解 TensorFlow 如何期望输入被格式化。我见过各种例子，例如 this，但这些例子处理的是一大串连续的时间序列数据。我的数据是不同的样本，每个都是独立的时间序列。

Answer 1

documentation of tf.nn.dynamic_rnn 状态：

inputs: The RNN inputs. If time_major == False (default), this must be a Tensor of shape: [batch_size, max_time, ...], or a nested tuple of such elements.

在您的例子中，这意味着输入的形状应为 [batch_size, 10, 2]。您无需一次训练所有 4000 个序列，而是在每次训练迭代中仅使用 batch_size 个序列。像下面这样的东西应该可以工作（为清楚起见添加了重塑）：

batch_size = 32
# batch_size sequences of length 10 with 2 values for each timestep
input = get_batch(X, batch_size).reshape([batch_size, 10, 2])
# Create LSTM cell with state size 256. Could also use GRUCell, ...
# Note: state_is_tuple=False is deprecated;
# the option might be completely removed in the future
cell = tf.nn.rnn_cell.LSTMCell(256, state_is_tuple=True)
outputs, state = tf.nn.dynamic_rnn(cell,
                                   input,
                                   sequence_length=[10]*batch_size,
                                   dtype=tf.float32)

来自 documentation, outputs will be of shape [batch_size, 10, 256], i.e. one 256-output for each timestep. state will be a tuple 个形状 [batch_size, 256]。您可以从中预测您的最终值，每个序列一个：

predictions = tf.contrib.layers.fully_connected(state.h,
                                                num_outputs=1,
                                                activation_fn=None)
loss = get_loss(get_batch(Y).reshape([batch_size, 1]), predictions)

outputs和state形状中的数字256分别由cell.output_size决定。 cell.state_size。像上面那样创建 LSTMCell 时，它们是相同的。另见 LSTMCell documentation.

Answer 2

（这个答案 "addreses" 直接 np.reshape() 没有按我们想要的方式组织最终数组时的问题。如果我们想直接重塑为 3D np.reshape 会这样做，但要注意输入的最终组织。

在我个人的尝试中，最终解决了为 RNN 提供输入形状的问题并且不再混淆，我将对此给出我的 "personal" 解释。

就我而言（我认为许多其他人的特征矩阵中可能有这种组织方案），"don't help" 之外的大多数博客。让我们尝试一下如何将 2D 特征矩阵转换为 RNN 的 3D 形状矩阵。

假设我们的特征矩阵中有这种组织类型：我们有5个观察结果（即行-我认为这是惯例是最合乎逻辑的术语）并且在每一行中，每个时间步都有 2 个特征（并且我们有 2 个时间步），如下所示：

（df是为了更好地从视觉上理解我的话）

In [1]: import numpy as np                                                           

In [2]: arr = np.random.randint(0,10,20).reshape((5,4))                              

In [3]: arr                                                                          
Out[3]: 
array([[3, 7, 4, 4],
       [7, 0, 6, 0],
       [2, 0, 2, 4],
       [3, 9, 3, 4],
       [1, 2, 3, 0]])

In [4]: import pandas as pd                                                          

In [5]: df = pd.DataFrame(arr, columns=['f1_t1', 'f2_t1', 'f1_t2', 'f2_t2'])         

In [6]: df                                                                           
Out[6]: 
   f1_t1  f2_t1  f1_t2  f2_t2
0      3      7      4      4
1      7      0      6      0
2      2      0      2      4
3      3      9      3      4
4      1      2      3      0

我们现在将使用这些值来处理它们。这里的问题是 RNN 将 "timestep" 维度纳入其输入 ，因为它们的架构性质。我们可以将该维度想象成 将二维数组一个接一个地堆叠起来，达到我们拥有的时间步数。 在这种情况下，我们有两个时间步；所以我们将有两个堆叠的二维数组：一个用于 timestep1，后面一个用于 timestep2。

实际上，在我们需要进行的 3D 输入中，我们仍然有 5 个观察结果。问题是我们需要以不同的方式安排它们：RNN 将采用第一个数组（即 timestep1）的第一行（或指定批次 - 但我们将在此处保持简单）和第二个堆叠数组（即时间步长 2)。然后是第二行……直到最后一行（在我们的例子中是第 5 行）。 So，在每个时间步长的每一行中，我们需要有两个特征，当然，在不同的数组中分开每个对应于它的时间步长。让我们看看这个数字。

为了便于理解，我会做两个数组。请记住，由于我们在 df 中的组织方案，您可能已经注意到 我们需要将前两列（即 timestep1 的特征 1 和 2）作为堆栈的第一个数组，最后一个两列，即第 3 列和第 4 列，作为我们的堆栈的第二个数组，这样最终一切都有意义了。

In [7]: arrStack1 = arr[:,0:2]                                                       

In [8]: arrStack1                                                                    
Out[8]: 
array([[3, 7],
       [7, 0],
       [2, 0],
       [3, 9],
       [1, 2]])

In [9]: arrStack2 = arr[:,2:4]                                                       

In [10]: arrStack2                                                                   
Out[10]: 
array([[4, 4],
       [6, 0],
       [2, 4],
       [3, 4],
       [3, 0]])

最后，我们唯一需要做的就是堆叠两个数组 ("one behind the other")，就好像它们是同一个最终结构的一部分：

In [11]: arrfinal3D = np.stack([arrStack1, arrStack2])                               

In [12]: arrfinal3D                                                                  
Out[12]: 
array([[[3, 7],
        [7, 0],
        [2, 0],
        [3, 9],
        [1, 2]],

       [[4, 4],
        [6, 0],
        [2, 4],
        [3, 4],
        [3, 0]]])

In [13]: arrfinal3D.shape                                                            
Out[13]: (2, 5, 2)

就是这样：我们已经准备好将特征矩阵输入 RNN 单元，同时考虑到我们对 2D 特征矩阵的组织。

（对于所有这些你可以使用的一个班轮：

In [14]: arrfinal3D_1 = np.stack([arr[:,0:2], arr[:,2:4]])                           

In [15]: arrfinal3D_1                                                                
Out[15]: 
array([[[3, 7],
        [7, 0],
        [2, 0],
        [3, 9],
        [1, 2]],

       [[4, 4],
        [6, 0],
        [2, 4],
        [3, 4],
        [3, 0]]])

了解 Tensorflow LSTM 输入形状

Understanding Tensorflow LSTM Input shape

python

regression

lstm

tensorflow