TensorFlow: ValueError: Shapes are incompatible
TensorFlow: ValueError: Shapes are incompatible
我在处理编码器-解码器模型的数据形状时遇到了一些困难。问题似乎与 Dense
层有关,但我无法弄清楚为什么存在不兼容问题。谁能帮帮我?
错误信息
ValueError: Shapes (None, 6) and (None, 6, 1208) are incompatible
型号
# Define an input sequence and process it.
encoder_inputs = Input(shape=(35,), name='encoder_inputs')
decoder_inputs = Input(shape=(6,), name='decoder_inputs')
embedding = Embedding(input_dim=vocab_size, output_dim=160, mask_zero=True)
encoder_embeddings = embedding(encoder_inputs)
decoder_embeddings = embedding(decoder_inputs)
encoder_lstm = LSTM(512, return_state=True, name='encoder_lstm')
LSTM_outputs, state_h, state_c = encoder_lstm(encoder_embeddings)
# We discard `LSTM_outputs` and only keep the other states.
encoder_states = [state_h, state_c]
decoder_lstm = LSTM(512, return_sequences=True, return_state=True, name='decoder_lstm')
# Set up the decoder, using `context vector` as initial state.
decoder_outputs, _, _ = decoder_lstm(decoder_embeddings,
initial_state=encoder_states)
#complete the decoder model by adding a Dense layer with Softmax activation function
#for prediction of the next output
decoder_dense = Dense(target_vocab_size, activation='softmax', name='decoder_dense')
decoder_outputs = decoder_dense(decoder_outputs)
# put together
model_encoder_training = Model([encoder_inputs, decoder_inputs], decoder_outputs, name='model_encoder_training')
Model: "model_encoder_training"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
decoder_inputs (InputLayer) [(None, 6)] 0
__________________________________________________________________________________________________
encoder_inputs (InputLayer) [(None, 35)] 0
__________________________________________________________________________________________________
embedding_12 (Embedding) multiple 457120 encoder_inputs[0][0]
decoder_inputs[0][0]
__________________________________________________________________________________________________
encoder_lstm (LSTM) [(None, 512), (None, 1378304 embedding_12[0][0]
__________________________________________________________________________________________________
decoder_lstm (LSTM) [(None, 6, 512), (No 1378304 embedding_12[1][0]
encoder_lstm[0][1]
encoder_lstm[0][2]
__________________________________________________________________________________________________
decoder_dense (Dense) (None, 6, 1208) 619704 decoder_lstm[0][0]
==================================================================================================
Total params: 3,833,432
Trainable params: 3,833,432
Non-trainable params: 0
__________________________________________________________________________________________________
变量和额外信息
X_train.shape = (24575, 35)
y_train.shape = (24575, 6)
X_decoder.shape = (24575, 6)
vocab_size = 2857
target_vocab_size = 1208
您应该确保使用 tf.keras.losses.SparseCategoricalCrossentropy()
作为损失函数,并且最后的 Dense
层包裹在 TimeDistributed
层周围。 decoder_lstm (LSTM)
返回一个形状为 (None, 6, 512)
的序列,您正在对其应用 Dense 层,但正如 docs 提到的:
If the input to the layer has a rank greater than 2, then Dense computes the dot product between the inputs and the kernel along the last axis of the inputs [...]
所以最后一个 Dense
层基本上忽略了 6 timesteps
并应用于最后一个维度 512,这可能不是你想要的。使用 TimeDistributed
层,您只需将具有 softmax 激活函数的 Dense
层应用于每个时间步 n 来计算词汇表中每个单词的概率尺寸为 1208。这是一个工作示例:
import tensorflow as tf
vocab_size = 2857
target_vocab_size = 1208
encoder_inputs = tf.keras.layers.Input(shape=(35,), name='encoder_inputs')
decoder_inputs = tf.keras.layers.Input(shape=(6,), name='decoder_inputs')
embedding = tf.keras.layers.Embedding(input_dim=vocab_size, output_dim=160, mask_zero=True)
encoder_embeddings = embedding(encoder_inputs)
decoder_embeddings = embedding(decoder_inputs)
encoder_lstm = tf.keras.layers.LSTM(512, return_state=True, name='encoder_lstm')
LSTM_outputs, state_h, state_c = encoder_lstm(encoder_embeddings)
encoder_states = [state_h, state_c]
decoder_lstm = tf.keras.layers.LSTM(512, return_sequences=True, return_state=True, name='decoder_lstm')
decoder_outputs, _, _ = decoder_lstm(decoder_embeddings,
initial_state=encoder_states)
decoder_dense = tf.keras.layers.TimeDistributed(tf.keras.layers.Dense(target_vocab_size, activation='softmax', name='decoder_dense'))
decoder_outputs = decoder_dense(decoder_outputs)
model_encoder_training = tf.keras.Model([encoder_inputs, decoder_inputs], decoder_outputs, name='model_encoder_training')
model_encoder_training.compile(optimizer='adam', loss=tf.keras.losses.SparseCategoricalCrossentropy())
samples = 100
X_train = tf.random.uniform((samples, 35), maxval=vocab_size, dtype=tf.int32)
X_decoder = tf.random.uniform((samples, 6), maxval=vocab_size, dtype=tf.int32)
y_train = tf.random.uniform((samples, 6), maxval=target_vocab_size, dtype=tf.int32)
model_encoder_training.fit([X_train, X_decoder], y_train, epochs=5, batch_size=10)
Epoch 1/5
10/10 [==============================] - 8s 302ms/step - loss: 7.0967
Epoch 2/5
10/10 [==============================] - 3s 300ms/step - loss: 6.8687
Epoch 3/5
10/10 [==============================] - 3s 302ms/step - loss: 6.5024
Epoch 4/5
10/10 [==============================] - 3s 300ms/step - loss: 6.1527
Epoch 5/5
10/10 [==============================] - 3s 300ms/step - loss: 5.9458
<keras.callbacks.History at 0x7f88cb66a990>
我在处理编码器-解码器模型的数据形状时遇到了一些困难。问题似乎与 Dense
层有关,但我无法弄清楚为什么存在不兼容问题。谁能帮帮我?
错误信息
ValueError: Shapes (None, 6) and (None, 6, 1208) are incompatible
型号
# Define an input sequence and process it.
encoder_inputs = Input(shape=(35,), name='encoder_inputs')
decoder_inputs = Input(shape=(6,), name='decoder_inputs')
embedding = Embedding(input_dim=vocab_size, output_dim=160, mask_zero=True)
encoder_embeddings = embedding(encoder_inputs)
decoder_embeddings = embedding(decoder_inputs)
encoder_lstm = LSTM(512, return_state=True, name='encoder_lstm')
LSTM_outputs, state_h, state_c = encoder_lstm(encoder_embeddings)
# We discard `LSTM_outputs` and only keep the other states.
encoder_states = [state_h, state_c]
decoder_lstm = LSTM(512, return_sequences=True, return_state=True, name='decoder_lstm')
# Set up the decoder, using `context vector` as initial state.
decoder_outputs, _, _ = decoder_lstm(decoder_embeddings,
initial_state=encoder_states)
#complete the decoder model by adding a Dense layer with Softmax activation function
#for prediction of the next output
decoder_dense = Dense(target_vocab_size, activation='softmax', name='decoder_dense')
decoder_outputs = decoder_dense(decoder_outputs)
# put together
model_encoder_training = Model([encoder_inputs, decoder_inputs], decoder_outputs, name='model_encoder_training')
Model: "model_encoder_training"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
decoder_inputs (InputLayer) [(None, 6)] 0
__________________________________________________________________________________________________
encoder_inputs (InputLayer) [(None, 35)] 0
__________________________________________________________________________________________________
embedding_12 (Embedding) multiple 457120 encoder_inputs[0][0]
decoder_inputs[0][0]
__________________________________________________________________________________________________
encoder_lstm (LSTM) [(None, 512), (None, 1378304 embedding_12[0][0]
__________________________________________________________________________________________________
decoder_lstm (LSTM) [(None, 6, 512), (No 1378304 embedding_12[1][0]
encoder_lstm[0][1]
encoder_lstm[0][2]
__________________________________________________________________________________________________
decoder_dense (Dense) (None, 6, 1208) 619704 decoder_lstm[0][0]
==================================================================================================
Total params: 3,833,432
Trainable params: 3,833,432
Non-trainable params: 0
__________________________________________________________________________________________________
变量和额外信息
X_train.shape = (24575, 35)
y_train.shape = (24575, 6)
X_decoder.shape = (24575, 6)
vocab_size = 2857
target_vocab_size = 1208
您应该确保使用 tf.keras.losses.SparseCategoricalCrossentropy()
作为损失函数,并且最后的 Dense
层包裹在 TimeDistributed
层周围。 decoder_lstm (LSTM)
返回一个形状为 (None, 6, 512)
的序列,您正在对其应用 Dense 层,但正如 docs 提到的:
If the input to the layer has a rank greater than 2, then Dense computes the dot product between the inputs and the kernel along the last axis of the inputs [...]
所以最后一个 Dense
层基本上忽略了 6 timesteps
并应用于最后一个维度 512,这可能不是你想要的。使用 TimeDistributed
层,您只需将具有 softmax 激活函数的 Dense
层应用于每个时间步 n 来计算词汇表中每个单词的概率尺寸为 1208。这是一个工作示例:
import tensorflow as tf
vocab_size = 2857
target_vocab_size = 1208
encoder_inputs = tf.keras.layers.Input(shape=(35,), name='encoder_inputs')
decoder_inputs = tf.keras.layers.Input(shape=(6,), name='decoder_inputs')
embedding = tf.keras.layers.Embedding(input_dim=vocab_size, output_dim=160, mask_zero=True)
encoder_embeddings = embedding(encoder_inputs)
decoder_embeddings = embedding(decoder_inputs)
encoder_lstm = tf.keras.layers.LSTM(512, return_state=True, name='encoder_lstm')
LSTM_outputs, state_h, state_c = encoder_lstm(encoder_embeddings)
encoder_states = [state_h, state_c]
decoder_lstm = tf.keras.layers.LSTM(512, return_sequences=True, return_state=True, name='decoder_lstm')
decoder_outputs, _, _ = decoder_lstm(decoder_embeddings,
initial_state=encoder_states)
decoder_dense = tf.keras.layers.TimeDistributed(tf.keras.layers.Dense(target_vocab_size, activation='softmax', name='decoder_dense'))
decoder_outputs = decoder_dense(decoder_outputs)
model_encoder_training = tf.keras.Model([encoder_inputs, decoder_inputs], decoder_outputs, name='model_encoder_training')
model_encoder_training.compile(optimizer='adam', loss=tf.keras.losses.SparseCategoricalCrossentropy())
samples = 100
X_train = tf.random.uniform((samples, 35), maxval=vocab_size, dtype=tf.int32)
X_decoder = tf.random.uniform((samples, 6), maxval=vocab_size, dtype=tf.int32)
y_train = tf.random.uniform((samples, 6), maxval=target_vocab_size, dtype=tf.int32)
model_encoder_training.fit([X_train, X_decoder], y_train, epochs=5, batch_size=10)
Epoch 1/5
10/10 [==============================] - 8s 302ms/step - loss: 7.0967
Epoch 2/5
10/10 [==============================] - 3s 300ms/step - loss: 6.8687
Epoch 3/5
10/10 [==============================] - 3s 302ms/step - loss: 6.5024
Epoch 4/5
10/10 [==============================] - 3s 300ms/step - loss: 6.1527
Epoch 5/5
10/10 [==============================] - 3s 300ms/step - loss: 5.9458
<keras.callbacks.History at 0x7f88cb66a990>