Tensorflow ValueError: Dimensions must be equal: LSTM+MDN
Tensorflow ValueError: Dimensions must be equal: LSTM+MDN
我正在尝试基于此实现(https://www.katnoria.com/mdn/)使用 LSTM + 混合密度网络制作下一个单词预测模型。
输入:300维词向量*window size(5)和21维数组(c)表示文档的主题分布,用于训练隐藏的初始状态。
输出:混合系数*num_gaussians、方差*num_gaussians、均值*num_gaussians*300(向量大小)
x.shape、y.shape、c.shape 实验性的 161 次观察给了我这样的结果:
(TensorShape([161, 5, 300]), TensorShape([161, 300]), TensorShape([161, 21]))
from tensorflow.keras.layers import Input, Dense, LSTM, Lambda
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam
from tensorflow.math import exp
# n_feat is size of word vector
n_feat = 300
window = 5
l = (window, n_feat)
hidden_state_dim = 21
# Number of gaussians to represent the multimodal distribution
k = 26
# Initial
mlp_inp = Input(shape=(hidden_state_dim,))
mlp_dense_h = Dense(128, activation='relu', name="dense_h")(mlp_inp)
mlp_dense_c = Dense(128, activation='relu', name="dense_c")(mlp_inp)
# Network
input = Input(shape=l)
layer1 = LSTM(128, return_sequences=True, name='baselayer1')(input, initial_state=[mlp_dense_h, mlp_dense_c])
layer2 = LSTM(128, name='baselayer2')(layer1)
# Mean
mu = Dense((n_feat * k), activation=None, name='mean_layer')(layer2)
# variance (should be greater than 0 so we exponentiate it)
var_layer = Dense(k, activation=None, name='dense_var_layer')(layer2)
var = Lambda(lambda x: exp(x), output_shape=(k,), name='variance_layer')(var_layer)
# mixing coefficient should sum to 1.0
pi = Dense(k, activation='softmax', name='pi_layer')(layer2)
下面是我模型的.summary()
Model: "model_12"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_7 (InputLayer) [(None, 21)] 0
__________________________________________________________________________________________________
input_8 (InputLayer) [(None, 5, 300)] 0
__________________________________________________________________________________________________
dense_h (Dense) (None, 128) 2816 input_7[0][0]
__________________________________________________________________________________________________
dense_c (Dense) (None, 128) 2816 input_7[0][0]
__________________________________________________________________________________________________
baselayer1 (LSTM) (None, 5, 128) 219648 input_8[0][0]
dense_h[0][0]
dense_c[0][0]
__________________________________________________________________________________________________
baselayer2 (LSTM) (None, 128) 131584 baselayer1[0][0]
__________________________________________________________________________________________________
dense_var_layer (Dense) (None, 26) 3354 baselayer2[0][0]
__________________________________________________________________________________________________
pi_layer (Dense) (None, 26) 3354 baselayer2[0][0]
__________________________________________________________________________________________________
mean_layer (Dense) (None, 7800) 1006200 baselayer2[0][0]
__________________________________________________________________________________________________
variance_layer (Lambda) (None, 26) 0 dense_var_layer[0][0]
==================================================================================================
Total params: 1,369,772
Trainable params: 1,369,772
Non-trainable params: 0
__________________________________________________________________________________________________
但是,当我尝试运行训练过程时,出现以下错误
ValueError: in user code:
<ipython-input-70-084e2be19035>:7 train_step *
loss = mdn_loss(y, pi_, mu_, var_)
<ipython-input-67-9a3cf3d4ccd2>:18 mdn_loss *
out = calc_pdf(y_true, mu, var)
<ipython-input-67-9a3cf3d4ccd2>:6 calc_pdf *
value = tf.subtract(y, mu)**2
.....
ValueError: Dimensions must be equal, but are 300 and 7800 for '{{node Sub}} = Sub[T=DT_FLOAT](y, model_15/mean_layer/BiasAdd)' with input shapes: [161,300], [161,7800].
它告诉我tf.subtract()中指定的变量的维度在calc_pdf()中使用有问题,
# Take a note how easy it is to write the loss function in
# new tensorflow eager mode (debugging the function becomes intuitive too)
def calc_pdf(y, mu, var):
"""Calculate component density"""
value = tf.subtract(y, mu)**2
value = (1/tf.math.sqrt(2 * np.pi * var)) * tf.math.exp((-1/(2*var)) * value)
return value
def mdn_loss(y_true, pi, mu, var):
"""MDN Loss Function
The eager mode in tensorflow 2.0 makes is extremely easy to write
functions like these. It feels a lot more pythonic to me.
"""
out = calc_pdf(y_true, mu, var)
# multiply with each pi and sum it
out = tf.multiply(out, pi)
out = tf.reduce_sum(out, 1, keepdims=True)
out = -tf.math.log(out + 1e-10)
return tf.reduce_mean(out)
但我不知道如何解决这个问题。我检查了原始实现(在上面的 link 中),其中包含 4000 个观察值、1 个特征和 26 个分布,这些分布的维度为 [4000, 1]、[4000, 26] 用于特定函数,并且工作正常。我觉得它应该也适用于 [161,300]、[161,7800],但事实并非如此。
我该如何解决这个问题?
(我已经检查过关于“维度必须相等”的类似问题,但无法弄清楚如何针对此特定实现进行这项工作。)
如果还不够,我可以post补充信息或代码,非常感谢您的回答!
对于 MDN 模型,必须使用所有高斯 pdf 计算每个样本的可能性,为此我认为您必须重塑矩阵(y_true 和 mu)并利用广播通过添加 1 作为最后一个维度的操作。例如:
def calc_pdf(y, mu, var):
"""Calculate component density"""
y = tf.reshape(y , (161,300,1))
mu = tf.reshape(mu ,(161,300,26))
value = tf.subtract(y, mu)**2
我正在尝试基于此实现(https://www.katnoria.com/mdn/)使用 LSTM + 混合密度网络制作下一个单词预测模型。
输入:300维词向量*window size(5)和21维数组(c)表示文档的主题分布,用于训练隐藏的初始状态。
输出:混合系数*num_gaussians、方差*num_gaussians、均值*num_gaussians*300(向量大小)
x.shape、y.shape、c.shape 实验性的 161 次观察给了我这样的结果:
(TensorShape([161, 5, 300]), TensorShape([161, 300]), TensorShape([161, 21]))
from tensorflow.keras.layers import Input, Dense, LSTM, Lambda
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam
from tensorflow.math import exp
# n_feat is size of word vector
n_feat = 300
window = 5
l = (window, n_feat)
hidden_state_dim = 21
# Number of gaussians to represent the multimodal distribution
k = 26
# Initial
mlp_inp = Input(shape=(hidden_state_dim,))
mlp_dense_h = Dense(128, activation='relu', name="dense_h")(mlp_inp)
mlp_dense_c = Dense(128, activation='relu', name="dense_c")(mlp_inp)
# Network
input = Input(shape=l)
layer1 = LSTM(128, return_sequences=True, name='baselayer1')(input, initial_state=[mlp_dense_h, mlp_dense_c])
layer2 = LSTM(128, name='baselayer2')(layer1)
# Mean
mu = Dense((n_feat * k), activation=None, name='mean_layer')(layer2)
# variance (should be greater than 0 so we exponentiate it)
var_layer = Dense(k, activation=None, name='dense_var_layer')(layer2)
var = Lambda(lambda x: exp(x), output_shape=(k,), name='variance_layer')(var_layer)
# mixing coefficient should sum to 1.0
pi = Dense(k, activation='softmax', name='pi_layer')(layer2)
下面是我模型的.summary()
Model: "model_12"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_7 (InputLayer) [(None, 21)] 0
__________________________________________________________________________________________________
input_8 (InputLayer) [(None, 5, 300)] 0
__________________________________________________________________________________________________
dense_h (Dense) (None, 128) 2816 input_7[0][0]
__________________________________________________________________________________________________
dense_c (Dense) (None, 128) 2816 input_7[0][0]
__________________________________________________________________________________________________
baselayer1 (LSTM) (None, 5, 128) 219648 input_8[0][0]
dense_h[0][0]
dense_c[0][0]
__________________________________________________________________________________________________
baselayer2 (LSTM) (None, 128) 131584 baselayer1[0][0]
__________________________________________________________________________________________________
dense_var_layer (Dense) (None, 26) 3354 baselayer2[0][0]
__________________________________________________________________________________________________
pi_layer (Dense) (None, 26) 3354 baselayer2[0][0]
__________________________________________________________________________________________________
mean_layer (Dense) (None, 7800) 1006200 baselayer2[0][0]
__________________________________________________________________________________________________
variance_layer (Lambda) (None, 26) 0 dense_var_layer[0][0]
==================================================================================================
Total params: 1,369,772
Trainable params: 1,369,772
Non-trainable params: 0
__________________________________________________________________________________________________
但是,当我尝试运行训练过程时,出现以下错误
ValueError: in user code:
<ipython-input-70-084e2be19035>:7 train_step *
loss = mdn_loss(y, pi_, mu_, var_)
<ipython-input-67-9a3cf3d4ccd2>:18 mdn_loss *
out = calc_pdf(y_true, mu, var)
<ipython-input-67-9a3cf3d4ccd2>:6 calc_pdf *
value = tf.subtract(y, mu)**2
.....
ValueError: Dimensions must be equal, but are 300 and 7800 for '{{node Sub}} = Sub[T=DT_FLOAT](y, model_15/mean_layer/BiasAdd)' with input shapes: [161,300], [161,7800].
它告诉我tf.subtract()中指定的变量的维度在calc_pdf()中使用有问题,
# Take a note how easy it is to write the loss function in
# new tensorflow eager mode (debugging the function becomes intuitive too)
def calc_pdf(y, mu, var):
"""Calculate component density"""
value = tf.subtract(y, mu)**2
value = (1/tf.math.sqrt(2 * np.pi * var)) * tf.math.exp((-1/(2*var)) * value)
return value
def mdn_loss(y_true, pi, mu, var):
"""MDN Loss Function
The eager mode in tensorflow 2.0 makes is extremely easy to write
functions like these. It feels a lot more pythonic to me.
"""
out = calc_pdf(y_true, mu, var)
# multiply with each pi and sum it
out = tf.multiply(out, pi)
out = tf.reduce_sum(out, 1, keepdims=True)
out = -tf.math.log(out + 1e-10)
return tf.reduce_mean(out)
但我不知道如何解决这个问题。我检查了原始实现(在上面的 link 中),其中包含 4000 个观察值、1 个特征和 26 个分布,这些分布的维度为 [4000, 1]、[4000, 26] 用于特定函数,并且工作正常。我觉得它应该也适用于 [161,300]、[161,7800],但事实并非如此。
我该如何解决这个问题?
(我已经检查过关于“维度必须相等”的类似问题,但无法弄清楚如何针对此特定实现进行这项工作。)
如果还不够,我可以post补充信息或代码,非常感谢您的回答!
对于 MDN 模型,必须使用所有高斯 pdf 计算每个样本的可能性,为此我认为您必须重塑矩阵(y_true 和 mu)并利用广播通过添加 1 作为最后一个维度的操作。例如:
def calc_pdf(y, mu, var):
"""Calculate component density"""
y = tf.reshape(y , (161,300,1))
mu = tf.reshape(mu ,(161,300,26))
value = tf.subtract(y, mu)**2