Tensorflow：了解使用和不使用 Dropout Wrapper 的 LSTM 输出

Question

import tensorflow as tf
import tensorflow.contrib.eager as tfe

tfe.enable_eager_execution()

x = tf.range(1, 11, dtype=tf.float32)
x = tf.reshape(x, (5, 1, 2))

cell = tf.contrib.rnn.LSTMCell(10)
initial_state = cell.zero_state(5, dtype=tf.float32)

y1, _ = tf.nn.dynamic_rnn(cell, x, dtype=tf.float32, initial_state=initial_state)

y2, _ = tf.nn.dynamic_rnn(
    tf.contrib.rnn.DropoutWrapper(cell, input_keep_prob=1.0, output_keep_prob=0.5, state_keep_prob=1.0),
    x,
    dtype=tf.float32,
    initial_state=initial_state)

我正在使用 Tensorflow 1.8.0。

我预计 y2 的输出与 y1 相似，因为 y2 使用与 y1 相同的 LSTM 单元，只是它通过一个丢弃层也是如此。由于 dropout 仅应用于 LSTM 单元的输出，我认为 y2 的值将与 y1 相同，除了这里和那里的几个 0。但这就是我得到的 y1:

<tf.Tensor: id=5540, shape=(5, 1, 10), dtype=float32, numpy=
array([[[-4.2897560e-02,  1.9367093e-01, -1.1827464e-01, -1.2339889e-01,
          1.3408028e-01,  1.3082971e-02, -2.4622230e-02, -1.5669680e-01,
          1.1127964e-01, -5.3087018e-02]],
       [[-7.1379542e-02,  4.5163053e-01, -1.6180833e-01, -1.3278724e-01,
          2.2819680e-01, -4.8406985e-02, -8.2188733e-03, -2.5466946e-01,
          2.8928292e-01, -7.3916554e-02]],
       [[-5.9056517e-02,  6.1984581e-01, -1.9882108e-01, -9.6297756e-02,
          2.5009862e-01, -8.0139056e-02, -2.2850712e-03, -2.7935350e-01,
          4.4566888e-01, -7.8914449e-02]],
       [[-3.8571563e-02,  6.9930458e-01, -2.2960691e-01, -6.1545946e-02,
          2.5194761e-01, -7.9383254e-02, -5.4560765e-04, -2.7542716e-01,
          5.5587584e-01, -7.3568568e-02]],
       [[-2.2481792e-02,  7.3400390e-01, -2.5636050e-01, -3.7012421e-02,
          2.4684550e-01, -6.3926049e-02, -1.1120128e-04, -2.5999820e-01,
          6.2801009e-01, -6.3132115e-02]]], dtype=float32)>

和 y2:

<tf.Tensor: id=5609, shape=(5, 1, 10), dtype=float32, numpy=
array([[[-0.08579512,  0.38734186, -0.23654927, -0.24679779,
          0.        ,  0.02616594, -0.        , -0.3133936 ,
          0.        , -0.        ]],
       [[-0.14275908,  0.        , -0.32361665, -0.26557449,
          0.        , -0.        , -0.        , -0.5093389 ,
          0.        , -0.        ]],
       [[-0.11811303,  0.        , -0.39764217, -0.        ,
          0.50019723, -0.16027811, -0.00457014, -0.        ,
          0.89133775, -0.        ]],
       [[-0.        ,  0.        , -0.45921382, -0.12309189,
          0.        , -0.        , -0.        , -0.        ,
          1.1117517 , -0.14713714]],
       [[-0.        ,  0.        , -0.        , -0.07402484,
          0.        , -0.        , -0.        , -0.5199964 ,
          1.2560202 , -0.        ]]], dtype=float32)>

y2中的非零值与y1中相应位置的值完全不同。

这是一个错误还是我对在 LSTM 单元的输出上应用 dropout 意味着什么有错误的想法？

Answer 1

y2 is equivalent to y1_drop/0.5.

当 dropout 应用于 y1 时，保留概率为 p，然后将输出除以 p。

如果您检查两个矩阵，y2 只不过是随机丢弃一半输入，然后将其缩放 0.5。

引自 Section 10 的 Dropout paper,

"We described dropout as a method where we retain units with probability p at training time and scale down the weights by multiplying them by a factor of p at test time. Another way to achieve the same effect is to scale up the retained activations by multiplying by 1/p at training time and not modifying the weights at test time. These methods are equivalent with appropriate scaling of the learning rate and weight initializations at each layer."

参考： Dropout：一种防止神经网络的简单方法过拟合

Tensorflow：了解使用和不使用 Dropout Wrapper 的 LSTM 输出

Tensorflow: Understanding LSTM output with and without Dropout Wrapper

lstm

tensorflow

dropout