在没有教师强制的情况下使用 LSTM 解码器 - Tensorflow
using LSTMs Decoder without teacher forcing - Tensorflow
我正在尝试在 Tensorflow 中构建序列到序列模型,我已经学习了几个教程,一切都很好。直到我决定在我的模型中删除教师强制。
下面是我正在使用的解码器网络示例:
def decoding_layer_train(encoder_state, dec_cell, dec_embed_input,
target_sequence_length, max_summary_length,
output_layer, keep_prob):
"""
Create a decoding layer for training
:param encoder_state: Encoder State
:param dec_cell: Decoder RNN Cell
:param dec_embed_input: Decoder embedded input
:param target_sequence_length: The lengths of each sequence in the target batch
:param max_summary_length: The length of the longest sequence in the batch
:param output_layer: Function to apply the output layer
:param keep_prob: Dropout keep probability
:return: BasicDecoderOutput containing training logits and sample_id
"""
training_helper = tf.contrib.seq2seq.TrainingHelper(inputs=dec_embed_input,
sequence_length=target_sequence_length,
time_major=False)
training_decoder = tf.contrib.seq2seq.BasicDecoder(dec_cell, training_helper, encoder_state, output_layer)
training_decoder_output = tf.contrib.seq2seq.dynamic_decode(training_decoder,
impute_finished=True,
maximum_iterations=max_summary_length)[0]
return training_decoder_output
据我了解,TrainingHelper 正在执行教师强制。特别是它将真实输出作为其参数的一部分。我尝试在没有培训帮助的情况下使用解码器,但这似乎是强制性的。我试图将真实输出设置为 0,但显然 TrainingHelper 需要输出。我也试过 google 一个解决方案,但我没有找到任何相关的东西。
===================更新=============
我为之前没有提及这一点而道歉,但我也尝试使用 GreedyEmbeddingHelper。模型 运行 进行了几次迭代,然后开始抛出 运行 时间错误。看起来 GreedyEmbeddingHelper 开始预测与预期形状不同的输出。下面是我使用 GreedyEmbeddingHelper
时的函数
def decoding_layer_train(encoder_state, dec_cell, dec_embeddings,
target_sequence_length, max_summary_length,
output_layer, keep_prob):
"""
Create a decoding layer for training
:param encoder_state: Encoder State
:param dec_cell: Decoder RNN Cell
:param dec_embed_input: Decoder embedded input
:param target_sequence_length: The lengths of each sequence in the target batch
:param max_summary_length: The length of the longest sequence in the batch
:param output_layer: Function to apply the output layer
:param keep_prob: Dropout keep probability
:return: BasicDecoderOutput containing training logits and sample_id
"""
start_tokens = tf.tile(tf.constant([target_vocab_to_int['<GO>']], dtype=tf.int32), [batch_size], name='start_tokens')
training_helper = tf.contrib.seq2seq.GreedyEmbeddingHelper(dec_embeddings,
start_tokens,
target_vocab_to_int['<EOS>'])
training_decoder = tf.contrib.seq2seq.BasicDecoder(dec_cell, training_helper, encoder_state, output_layer)
training_decoder_output = tf.contrib.seq2seq.dynamic_decode(training_decoder,
impute_finished=True,
maximum_iterations=max_summary_length)[0]
return training_decoder_output
这是经过多次训练迭代后抛出的错误示例:
Ok
Epoch 0 Batch 5/91 - Train Accuracy: 0.4347, Validation Accuracy: 0.3557, Loss: 2.8656
++++Epoch 0 Batch 5/91 - Train WER: 1.0000, Validation WER: 1.0000
Epoch 0 Batch 10/91 - Train Accuracy: 0.4050, Validation Accuracy: 0.3864, Loss: 2.6347
++++Epoch 0 Batch 10/91 - Train WER: 1.0000, Validation WER: 1.0000
---------------------------------------------------------------------------
InvalidArgumentError Traceback (most recent call last)
<ipython-input-115-1d2a9495ad42> in <module>()
57 target_sequence_length: targets_lengths,
58 source_sequence_length: sources_lengths,
---> 59 keep_prob: keep_probability})
60
61
/Users/alsulaimi/Documents/AI/Tensorflow-make/workspace/lib/python2.7/site-packages/tensorflow/python/client/session.pyc in run(self, fetches, feed_dict, options, run_metadata)
887 try:
888 result = self._run(None, fetches, feed_dict, options_ptr,
--> 889 run_metadata_ptr)
890 if run_metadata:
891 proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)
/Users/alsulaimi/Documents/AI/Tensorflow-make/workspace/lib/python2.7/site-packages/tensorflow/python/client/session.pyc in _run(self, handle, fetches, feed_dict, options, run_metadata)
1116 if final_fetches or final_targets or (handle and feed_dict_tensor):
1117 results = self._do_run(handle, final_targets, final_fetches,
-> 1118 feed_dict_tensor, options, run_metadata)
1119 else:
1120 results = []
/Users/alsulaimi/Documents/AI/Tensorflow-make/workspace/lib/python2.7/site-packages/tensorflow/python/client/session.pyc in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
1313 if handle is None:
1314 return self._do_call(_run_fn, self._session, feeds, fetches, targets,
-> 1315 options, run_metadata)
1316 else:
1317 return self._do_call(_prun_fn, self._session, handle, feeds, fetches)
/Users/alsulaimi/Documents/AI/Tensorflow-make/workspace/lib/python2.7/site-packages/tensorflow/python/client/session.pyc in _do_call(self, fn, *args)
1332 except KeyError:
1333 pass
-> 1334 raise type(e)(node_def, op, message)
1335
1336 def _extend_graph(self):
InvalidArgumentError: logits and labels must have the same first dimension, got logits shape [1100,78] and labels shape [1400]
我不确定,但我想 GreedyEmbeddingHepler 不应该用于训练。 ,我将感谢您对如何阻止老师强迫的帮助和想法。
谢谢。
有不同的助手,它们都继承自同一个 class。您可以在 documentation 中找到更多信息。正如您所说 TrainingHelper
需要预定义的真实输入,这些输入预计会从解码器输出,并且这个真实输入作为下一步提供(而不是提供上一步的输出)。这种方法(通过一些研究)应该加快解码器的训练。
在你的例子中,你正在寻找 GreedyEmbeddingHelper
。只需将其替换为 TrainingHelper
即可:
training_helper = tf.contrib.seq2seq.GreedyEmbeddingHelper(
embedding=embedding,
start_tokens=tf.tile([GO_SYMBOL], [batch_size]),
end_token=END_SYMBOL)
只需将其替换为您在问题中使用的 embedding
张量和变量。此助手自动获取应用嵌入的步骤的输出并将其作为输入提供给后续步骤。第一步使用 start_token
.
使用 GreedyEmbeddingHelper
生成的输出不必与预期输出的长度相匹配。您必须使用填充来匹配它们的形状。 TensorFlow 提供函数tf.pad()
。另外 tf.contrib.seq2seq.dynamic_decode
returns 包含 (final_outputs, final_state, final_sequence_lengths)
的元组,因此您可以使用 final_sequece_lengths
的值进行填充。
logits_pad = tf.pad(
logits,
[[0, tf.maximum(expected_length - tf.reduce_max(final_seq_lengths), 0)],
[0, 0]],
constant_values=PAD_VALUE,
mode='CONSTANT')
targets_pad = tf.pad(
targets,
[[0, tf.maximum(tf.reduce_max(final_seq_lengths) - expected_length, 0)]],
constant_values=PAD_VALUE,
mode='CONSTANT')
您可能需要根据输入的形状稍微更改填充。此外,如果您将 maximum_iterations
参数设置为匹配 targets
形状,则不必填充 targets
。
我正在尝试在 Tensorflow 中构建序列到序列模型,我已经学习了几个教程,一切都很好。直到我决定在我的模型中删除教师强制。 下面是我正在使用的解码器网络示例:
def decoding_layer_train(encoder_state, dec_cell, dec_embed_input,
target_sequence_length, max_summary_length,
output_layer, keep_prob):
"""
Create a decoding layer for training
:param encoder_state: Encoder State
:param dec_cell: Decoder RNN Cell
:param dec_embed_input: Decoder embedded input
:param target_sequence_length: The lengths of each sequence in the target batch
:param max_summary_length: The length of the longest sequence in the batch
:param output_layer: Function to apply the output layer
:param keep_prob: Dropout keep probability
:return: BasicDecoderOutput containing training logits and sample_id
"""
training_helper = tf.contrib.seq2seq.TrainingHelper(inputs=dec_embed_input,
sequence_length=target_sequence_length,
time_major=False)
training_decoder = tf.contrib.seq2seq.BasicDecoder(dec_cell, training_helper, encoder_state, output_layer)
training_decoder_output = tf.contrib.seq2seq.dynamic_decode(training_decoder,
impute_finished=True,
maximum_iterations=max_summary_length)[0]
return training_decoder_output
据我了解,TrainingHelper 正在执行教师强制。特别是它将真实输出作为其参数的一部分。我尝试在没有培训帮助的情况下使用解码器,但这似乎是强制性的。我试图将真实输出设置为 0,但显然 TrainingHelper 需要输出。我也试过 google 一个解决方案,但我没有找到任何相关的东西。
===================更新=============
我为之前没有提及这一点而道歉,但我也尝试使用 GreedyEmbeddingHelper。模型 运行 进行了几次迭代,然后开始抛出 运行 时间错误。看起来 GreedyEmbeddingHelper 开始预测与预期形状不同的输出。下面是我使用 GreedyEmbeddingHelper
时的函数def decoding_layer_train(encoder_state, dec_cell, dec_embeddings,
target_sequence_length, max_summary_length,
output_layer, keep_prob):
"""
Create a decoding layer for training
:param encoder_state: Encoder State
:param dec_cell: Decoder RNN Cell
:param dec_embed_input: Decoder embedded input
:param target_sequence_length: The lengths of each sequence in the target batch
:param max_summary_length: The length of the longest sequence in the batch
:param output_layer: Function to apply the output layer
:param keep_prob: Dropout keep probability
:return: BasicDecoderOutput containing training logits and sample_id
"""
start_tokens = tf.tile(tf.constant([target_vocab_to_int['<GO>']], dtype=tf.int32), [batch_size], name='start_tokens')
training_helper = tf.contrib.seq2seq.GreedyEmbeddingHelper(dec_embeddings,
start_tokens,
target_vocab_to_int['<EOS>'])
training_decoder = tf.contrib.seq2seq.BasicDecoder(dec_cell, training_helper, encoder_state, output_layer)
training_decoder_output = tf.contrib.seq2seq.dynamic_decode(training_decoder,
impute_finished=True,
maximum_iterations=max_summary_length)[0]
return training_decoder_output
这是经过多次训练迭代后抛出的错误示例:
Ok
Epoch 0 Batch 5/91 - Train Accuracy: 0.4347, Validation Accuracy: 0.3557, Loss: 2.8656
++++Epoch 0 Batch 5/91 - Train WER: 1.0000, Validation WER: 1.0000
Epoch 0 Batch 10/91 - Train Accuracy: 0.4050, Validation Accuracy: 0.3864, Loss: 2.6347
++++Epoch 0 Batch 10/91 - Train WER: 1.0000, Validation WER: 1.0000
---------------------------------------------------------------------------
InvalidArgumentError Traceback (most recent call last)
<ipython-input-115-1d2a9495ad42> in <module>()
57 target_sequence_length: targets_lengths,
58 source_sequence_length: sources_lengths,
---> 59 keep_prob: keep_probability})
60
61
/Users/alsulaimi/Documents/AI/Tensorflow-make/workspace/lib/python2.7/site-packages/tensorflow/python/client/session.pyc in run(self, fetches, feed_dict, options, run_metadata)
887 try:
888 result = self._run(None, fetches, feed_dict, options_ptr,
--> 889 run_metadata_ptr)
890 if run_metadata:
891 proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)
/Users/alsulaimi/Documents/AI/Tensorflow-make/workspace/lib/python2.7/site-packages/tensorflow/python/client/session.pyc in _run(self, handle, fetches, feed_dict, options, run_metadata)
1116 if final_fetches or final_targets or (handle and feed_dict_tensor):
1117 results = self._do_run(handle, final_targets, final_fetches,
-> 1118 feed_dict_tensor, options, run_metadata)
1119 else:
1120 results = []
/Users/alsulaimi/Documents/AI/Tensorflow-make/workspace/lib/python2.7/site-packages/tensorflow/python/client/session.pyc in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
1313 if handle is None:
1314 return self._do_call(_run_fn, self._session, feeds, fetches, targets,
-> 1315 options, run_metadata)
1316 else:
1317 return self._do_call(_prun_fn, self._session, handle, feeds, fetches)
/Users/alsulaimi/Documents/AI/Tensorflow-make/workspace/lib/python2.7/site-packages/tensorflow/python/client/session.pyc in _do_call(self, fn, *args)
1332 except KeyError:
1333 pass
-> 1334 raise type(e)(node_def, op, message)
1335
1336 def _extend_graph(self):
InvalidArgumentError: logits and labels must have the same first dimension, got logits shape [1100,78] and labels shape [1400]
我不确定,但我想 GreedyEmbeddingHepler 不应该用于训练。 ,我将感谢您对如何阻止老师强迫的帮助和想法。
谢谢。
有不同的助手,它们都继承自同一个 class。您可以在 documentation 中找到更多信息。正如您所说 TrainingHelper
需要预定义的真实输入,这些输入预计会从解码器输出,并且这个真实输入作为下一步提供(而不是提供上一步的输出)。这种方法(通过一些研究)应该加快解码器的训练。
在你的例子中,你正在寻找 GreedyEmbeddingHelper
。只需将其替换为 TrainingHelper
即可:
training_helper = tf.contrib.seq2seq.GreedyEmbeddingHelper(
embedding=embedding,
start_tokens=tf.tile([GO_SYMBOL], [batch_size]),
end_token=END_SYMBOL)
只需将其替换为您在问题中使用的 embedding
张量和变量。此助手自动获取应用嵌入的步骤的输出并将其作为输入提供给后续步骤。第一步使用 start_token
.
使用 GreedyEmbeddingHelper
生成的输出不必与预期输出的长度相匹配。您必须使用填充来匹配它们的形状。 TensorFlow 提供函数tf.pad()
。另外 tf.contrib.seq2seq.dynamic_decode
returns 包含 (final_outputs, final_state, final_sequence_lengths)
的元组,因此您可以使用 final_sequece_lengths
的值进行填充。
logits_pad = tf.pad(
logits,
[[0, tf.maximum(expected_length - tf.reduce_max(final_seq_lengths), 0)],
[0, 0]],
constant_values=PAD_VALUE,
mode='CONSTANT')
targets_pad = tf.pad(
targets,
[[0, tf.maximum(tf.reduce_max(final_seq_lengths) - expected_length, 0)]],
constant_values=PAD_VALUE,
mode='CONSTANT')
您可能需要根据输入的形状稍微更改填充。此外,如果您将 maximum_iterations
参数设置为匹配 targets
形状,则不必填充 targets
。