Tensorflow:了解使用和不使用 Dropout Wrapper 的 LSTM 输出
Tensorflow: Understanding LSTM output with and without Dropout Wrapper
import tensorflow as tf
import tensorflow.contrib.eager as tfe
tfe.enable_eager_execution()
x = tf.range(1, 11, dtype=tf.float32)
x = tf.reshape(x, (5, 1, 2))
cell = tf.contrib.rnn.LSTMCell(10)
initial_state = cell.zero_state(5, dtype=tf.float32)
y1, _ = tf.nn.dynamic_rnn(cell, x, dtype=tf.float32, initial_state=initial_state)
y2, _ = tf.nn.dynamic_rnn(
tf.contrib.rnn.DropoutWrapper(cell, input_keep_prob=1.0, output_keep_prob=0.5, state_keep_prob=1.0),
x,
dtype=tf.float32,
initial_state=initial_state)
我正在使用 Tensorflow 1.8.0。
我预计 y2
的输出与 y1
相似,因为 y2
使用与 y1
相同的 LSTM 单元,只是它通过一个丢弃层也是如此。由于 dropout 仅应用于 LSTM 单元的输出,我认为 y2
的值将与 y1
相同,除了这里和那里的几个 0。但这就是我得到的 y1
:
<tf.Tensor: id=5540, shape=(5, 1, 10), dtype=float32, numpy=
array([[[-4.2897560e-02, 1.9367093e-01, -1.1827464e-01, -1.2339889e-01,
1.3408028e-01, 1.3082971e-02, -2.4622230e-02, -1.5669680e-01,
1.1127964e-01, -5.3087018e-02]],
[[-7.1379542e-02, 4.5163053e-01, -1.6180833e-01, -1.3278724e-01,
2.2819680e-01, -4.8406985e-02, -8.2188733e-03, -2.5466946e-01,
2.8928292e-01, -7.3916554e-02]],
[[-5.9056517e-02, 6.1984581e-01, -1.9882108e-01, -9.6297756e-02,
2.5009862e-01, -8.0139056e-02, -2.2850712e-03, -2.7935350e-01,
4.4566888e-01, -7.8914449e-02]],
[[-3.8571563e-02, 6.9930458e-01, -2.2960691e-01, -6.1545946e-02,
2.5194761e-01, -7.9383254e-02, -5.4560765e-04, -2.7542716e-01,
5.5587584e-01, -7.3568568e-02]],
[[-2.2481792e-02, 7.3400390e-01, -2.5636050e-01, -3.7012421e-02,
2.4684550e-01, -6.3926049e-02, -1.1120128e-04, -2.5999820e-01,
6.2801009e-01, -6.3132115e-02]]], dtype=float32)>
和 y2
:
<tf.Tensor: id=5609, shape=(5, 1, 10), dtype=float32, numpy=
array([[[-0.08579512, 0.38734186, -0.23654927, -0.24679779,
0. , 0.02616594, -0. , -0.3133936 ,
0. , -0. ]],
[[-0.14275908, 0. , -0.32361665, -0.26557449,
0. , -0. , -0. , -0.5093389 ,
0. , -0. ]],
[[-0.11811303, 0. , -0.39764217, -0. ,
0.50019723, -0.16027811, -0.00457014, -0. ,
0.89133775, -0. ]],
[[-0. , 0. , -0.45921382, -0.12309189,
0. , -0. , -0. , -0. ,
1.1117517 , -0.14713714]],
[[-0. , 0. , -0. , -0.07402484,
0. , -0. , -0. , -0.5199964 ,
1.2560202 , -0. ]]], dtype=float32)>
y2
中的非零值与y1
中相应位置的值完全不同。
这是一个错误还是我对在 LSTM 单元的输出上应用 dropout 意味着什么有错误的想法?
y2 is equivalent to y1_drop/0.5
.
当 dropout
应用于 y1
时,保留概率为 p
,然后将输出除以 p
。
如果您检查两个矩阵,y2
只不过是随机丢弃一半输入,然后将其缩放 0.5。
引自 Section 10
的 Dropout paper
,
"We described dropout as a method where we retain units with
probability p
at training time and scale down the weights by
multiplying them by a factor of p
at test time. Another way to
achieve the same effect is to scale up the retained activations by
multiplying by 1/p
at training time and not modifying the weights at
test time. These methods are equivalent with appropriate scaling of
the learning rate and weight initializations at each layer."
import tensorflow as tf
import tensorflow.contrib.eager as tfe
tfe.enable_eager_execution()
x = tf.range(1, 11, dtype=tf.float32)
x = tf.reshape(x, (5, 1, 2))
cell = tf.contrib.rnn.LSTMCell(10)
initial_state = cell.zero_state(5, dtype=tf.float32)
y1, _ = tf.nn.dynamic_rnn(cell, x, dtype=tf.float32, initial_state=initial_state)
y2, _ = tf.nn.dynamic_rnn(
tf.contrib.rnn.DropoutWrapper(cell, input_keep_prob=1.0, output_keep_prob=0.5, state_keep_prob=1.0),
x,
dtype=tf.float32,
initial_state=initial_state)
我正在使用 Tensorflow 1.8.0。
我预计 y2
的输出与 y1
相似,因为 y2
使用与 y1
相同的 LSTM 单元,只是它通过一个丢弃层也是如此。由于 dropout 仅应用于 LSTM 单元的输出,我认为 y2
的值将与 y1
相同,除了这里和那里的几个 0。但这就是我得到的 y1
:
<tf.Tensor: id=5540, shape=(5, 1, 10), dtype=float32, numpy=
array([[[-4.2897560e-02, 1.9367093e-01, -1.1827464e-01, -1.2339889e-01,
1.3408028e-01, 1.3082971e-02, -2.4622230e-02, -1.5669680e-01,
1.1127964e-01, -5.3087018e-02]],
[[-7.1379542e-02, 4.5163053e-01, -1.6180833e-01, -1.3278724e-01,
2.2819680e-01, -4.8406985e-02, -8.2188733e-03, -2.5466946e-01,
2.8928292e-01, -7.3916554e-02]],
[[-5.9056517e-02, 6.1984581e-01, -1.9882108e-01, -9.6297756e-02,
2.5009862e-01, -8.0139056e-02, -2.2850712e-03, -2.7935350e-01,
4.4566888e-01, -7.8914449e-02]],
[[-3.8571563e-02, 6.9930458e-01, -2.2960691e-01, -6.1545946e-02,
2.5194761e-01, -7.9383254e-02, -5.4560765e-04, -2.7542716e-01,
5.5587584e-01, -7.3568568e-02]],
[[-2.2481792e-02, 7.3400390e-01, -2.5636050e-01, -3.7012421e-02,
2.4684550e-01, -6.3926049e-02, -1.1120128e-04, -2.5999820e-01,
6.2801009e-01, -6.3132115e-02]]], dtype=float32)>
和 y2
:
<tf.Tensor: id=5609, shape=(5, 1, 10), dtype=float32, numpy=
array([[[-0.08579512, 0.38734186, -0.23654927, -0.24679779,
0. , 0.02616594, -0. , -0.3133936 ,
0. , -0. ]],
[[-0.14275908, 0. , -0.32361665, -0.26557449,
0. , -0. , -0. , -0.5093389 ,
0. , -0. ]],
[[-0.11811303, 0. , -0.39764217, -0. ,
0.50019723, -0.16027811, -0.00457014, -0. ,
0.89133775, -0. ]],
[[-0. , 0. , -0.45921382, -0.12309189,
0. , -0. , -0. , -0. ,
1.1117517 , -0.14713714]],
[[-0. , 0. , -0. , -0.07402484,
0. , -0. , -0. , -0.5199964 ,
1.2560202 , -0. ]]], dtype=float32)>
y2
中的非零值与y1
中相应位置的值完全不同。
这是一个错误还是我对在 LSTM 单元的输出上应用 dropout 意味着什么有错误的想法?
y2 is equivalent to y1_drop/0.5
.
当 dropout
应用于 y1
时,保留概率为 p
,然后将输出除以 p
。
如果您检查两个矩阵,y2
只不过是随机丢弃一半输入,然后将其缩放 0.5。
引自 Section 10
的 Dropout paper
,
"We described dropout as a method where we retain units with probability
p
at training time and scale down the weights by multiplying them by a factor ofp
at test time. Another way to achieve the same effect is to scale up the retained activations by multiplying by1/p
at training time and not modifying the weights at test time. These methods are equivalent with appropriate scaling of the learning rate and weight initializations at each layer."