在解码器部分定义 NMT 和图像描述的维度
Defining dimension of NMT and image captioning with attention at the decoder part
我一直在仔细检查下面那些教程中的模型。
https://www.tensorflow.org/tutorials/text/nmt_with_attention
和
https://www.tensorflow.org/tutorials/text/image_captioning
两个教程中定义解码器的部分我都看不懂。
在 NMT 中,注意解码器部分如下,
class Decoder(tf.keras.Model):
def __init__(self, vocab_size, embedding_dim, dec_units, batch_sz):
super(Decoder, self).__init__()
self.batch_sz = batch_sz
self.dec_units = dec_units
self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim)
self.gru = tf.keras.layers.GRU(self.dec_units,
return_sequences=True,
return_state=True,
recurrent_initializer='glorot_uniform')
self.fc = tf.keras.layers.Dense(vocab_size)
# used for attention
self.attention = BahdanauAttention(self.dec_units)
def call(self, x, hidden, enc_output):
# enc_output shape == (batch_size, max_length, hidden_size)
context_vector, attention_weights = self.attention(hidden, enc_output)
# x shape after passing through embedding == (batch_size, 1, embedding_dim)
x = self.embedding(x)
# x shape after concatenation == (batch_size, 1, embedding_dim + hidden_size)
x = tf.concat([tf.expand_dims(context_vector, 1), x], axis=-1)
# passing the concatenated vector to the GRU
output, state = self.gru(x)
# output shape == (batch_size * 1, hidden_size)
output = tf.reshape(output, (-1, output.shape[2]))
# output shape == (batch_size, vocab)
x = self.fc(output)
return x, state, attention_weights
这里,#经过embedding后的x形状== (batch_size, 1, embedding_dim)
x = self.embedding(x)。这里的 x 应该是什么?它只是目标输入吗?
在上面,我不明白为什么输出形状必须是(batch_size * 1,hidden_size)。为什么 batch_size*1?
和图像字幕解码器部分如下,
class RNN_Decoder(tf.keras.Model):
def __init__(self, embedding_dim, units, vocab_size):
super(RNN_Decoder, self).__init__()
self.units = units
self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim)
self.gru = tf.keras.layers.GRU(self.units,
return_sequences=True,
return_state=True,
recurrent_initializer='glorot_uniform')
self.fc1 = tf.keras.layers.Dense(self.units)
self.fc2 = tf.keras.layers.Dense(vocab_size)
self.attention = BahdanauAttention(self.units)
def call(self, x, features, hidden):
# defining attention as a separate model
context_vector, attention_weights = self.attention(features, hidden)
# x shape after passing through embedding == (batch_size, 1, embedding_dim)
x = self.embedding(x)
# x shape after concatenation == (batch_size, 1, embedding_dim + hidden_size)
x = tf.concat([tf.expand_dims(context_vector, 1), x], axis=-1)
# passing the concatenated vector to the GRU
output, state = self.gru(x)
# shape == (batch_size, max_length, hidden_size)
x = self.fc1(output)
# x shape == (batch_size * max_length, hidden_size)
x = tf.reshape(x, (-1, x.shape[2]))
# output shape == (batch_size * max_length, vocab)
x = self.fc2(x)
return x, state, attention_weights
def reset_state(self, batch_size):
return tf.zeros((batch_size, self.units))
为什么输出形状必须重塑为 (batch_size * max_length, hidden_size)?
有人可以给我详细信息吗?
这对我很有帮助
重塑的原因是调用 TensorFlow 中的全连接层(与 Pytorch 不同)仅接受二维输入。
在第一个示例中,解码器的 call
方法应该在每个时间步长的 for 循环中执行(在训练和推理时间)。但是,GRU 需要形状为 batch × length × dim 的输入,如果你称它为 step-一步一步,长度为1.
在第二个示例中,您可以在训练时对整个真实序列调用解码器,但它仍然可以使用长度 1,因此您可以在推理时在 for 循环中使用它。
我一直在仔细检查下面那些教程中的模型。
https://www.tensorflow.org/tutorials/text/nmt_with_attention
和
https://www.tensorflow.org/tutorials/text/image_captioning
两个教程中定义解码器的部分我都看不懂。
在 NMT 中,注意解码器部分如下,
class Decoder(tf.keras.Model):
def __init__(self, vocab_size, embedding_dim, dec_units, batch_sz):
super(Decoder, self).__init__()
self.batch_sz = batch_sz
self.dec_units = dec_units
self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim)
self.gru = tf.keras.layers.GRU(self.dec_units,
return_sequences=True,
return_state=True,
recurrent_initializer='glorot_uniform')
self.fc = tf.keras.layers.Dense(vocab_size)
# used for attention
self.attention = BahdanauAttention(self.dec_units)
def call(self, x, hidden, enc_output):
# enc_output shape == (batch_size, max_length, hidden_size)
context_vector, attention_weights = self.attention(hidden, enc_output)
# x shape after passing through embedding == (batch_size, 1, embedding_dim)
x = self.embedding(x)
# x shape after concatenation == (batch_size, 1, embedding_dim + hidden_size)
x = tf.concat([tf.expand_dims(context_vector, 1), x], axis=-1)
# passing the concatenated vector to the GRU
output, state = self.gru(x)
# output shape == (batch_size * 1, hidden_size)
output = tf.reshape(output, (-1, output.shape[2]))
# output shape == (batch_size, vocab)
x = self.fc(output)
return x, state, attention_weights
这里,#经过embedding后的x形状== (batch_size, 1, embedding_dim) x = self.embedding(x)。这里的 x 应该是什么?它只是目标输入吗?
在上面,我不明白为什么输出形状必须是(batch_size * 1,hidden_size)。为什么 batch_size*1?
和图像字幕解码器部分如下,
class RNN_Decoder(tf.keras.Model):
def __init__(self, embedding_dim, units, vocab_size):
super(RNN_Decoder, self).__init__()
self.units = units
self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim)
self.gru = tf.keras.layers.GRU(self.units,
return_sequences=True,
return_state=True,
recurrent_initializer='glorot_uniform')
self.fc1 = tf.keras.layers.Dense(self.units)
self.fc2 = tf.keras.layers.Dense(vocab_size)
self.attention = BahdanauAttention(self.units)
def call(self, x, features, hidden):
# defining attention as a separate model
context_vector, attention_weights = self.attention(features, hidden)
# x shape after passing through embedding == (batch_size, 1, embedding_dim)
x = self.embedding(x)
# x shape after concatenation == (batch_size, 1, embedding_dim + hidden_size)
x = tf.concat([tf.expand_dims(context_vector, 1), x], axis=-1)
# passing the concatenated vector to the GRU
output, state = self.gru(x)
# shape == (batch_size, max_length, hidden_size)
x = self.fc1(output)
# x shape == (batch_size * max_length, hidden_size)
x = tf.reshape(x, (-1, x.shape[2]))
# output shape == (batch_size * max_length, vocab)
x = self.fc2(x)
return x, state, attention_weights
def reset_state(self, batch_size):
return tf.zeros((batch_size, self.units))
为什么输出形状必须重塑为 (batch_size * max_length, hidden_size)?
有人可以给我详细信息吗?
这对我很有帮助
重塑的原因是调用 TensorFlow 中的全连接层(与 Pytorch 不同)仅接受二维输入。
在第一个示例中,解码器的 call
方法应该在每个时间步长的 for 循环中执行(在训练和推理时间)。但是,GRU 需要形状为 batch × length × dim 的输入,如果你称它为 step-一步一步,长度为1.
在第二个示例中,您可以在训练时对整个真实序列调用解码器,但它仍然可以使用长度 1,因此您可以在推理时在 for 循环中使用它。