Tensorflow ValueError: Dimensions must be equal: LSTM+MDN

Tensorflow ValueError: Dimensions must be equal: LSTM+MDN

我正在尝试基于此实现(https://www.katnoria.com/mdn/)使用 LSTM + 混合密度网络制作下一个单词预测模型。

输入:300维词向量*window size(5)和21维数组(c)表示文档的主题分布,用于训练隐藏的初始状态。

输出:混合系数*num_gaussians、方差*num_gaussians、均值*num_gaussians*300(向量大小)

x.shape、y.shape、c.shape 实验性的 161 次观察给了我这样的结果:

(TensorShape([161, 5, 300]), TensorShape([161, 300]), TensorShape([161, 21]))

from tensorflow.keras.layers import Input, Dense, LSTM, Lambda
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam
from tensorflow.math import exp

# n_feat is size of word vector
n_feat = 300
window = 5
l = (window, n_feat)
hidden_state_dim = 21

# Number of gaussians to represent the multimodal distribution
k = 26

# Initial
mlp_inp = Input(shape=(hidden_state_dim,))
mlp_dense_h = Dense(128, activation='relu', name="dense_h")(mlp_inp)
mlp_dense_c = Dense(128, activation='relu', name="dense_c")(mlp_inp)

# Network
input = Input(shape=l)
layer1 = LSTM(128, return_sequences=True, name='baselayer1')(input, initial_state=[mlp_dense_h, mlp_dense_c])
layer2 = LSTM(128, name='baselayer2')(layer1)

# Mean
mu = Dense((n_feat * k), activation=None, name='mean_layer')(layer2)
# variance (should be greater than 0 so we exponentiate it)
var_layer = Dense(k, activation=None, name='dense_var_layer')(layer2)
var = Lambda(lambda x: exp(x), output_shape=(k,), name='variance_layer')(var_layer)
# mixing coefficient should sum to 1.0
pi = Dense(k, activation='softmax', name='pi_layer')(layer2)

下面是我模型的.summary()

Model: "model_12"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_7 (InputLayer)            [(None, 21)]         0                                            
__________________________________________________________________________________________________
input_8 (InputLayer)            [(None, 5, 300)]     0                                            
__________________________________________________________________________________________________
dense_h (Dense)                 (None, 128)          2816        input_7[0][0]                    
__________________________________________________________________________________________________
dense_c (Dense)                 (None, 128)          2816        input_7[0][0]                    
__________________________________________________________________________________________________
baselayer1 (LSTM)               (None, 5, 128)       219648      input_8[0][0]                    
                                                                 dense_h[0][0]                    
                                                                 dense_c[0][0]                    
__________________________________________________________________________________________________
baselayer2 (LSTM)               (None, 128)          131584      baselayer1[0][0]                 
__________________________________________________________________________________________________
dense_var_layer (Dense)         (None, 26)           3354        baselayer2[0][0]                 
__________________________________________________________________________________________________
pi_layer (Dense)                (None, 26)           3354        baselayer2[0][0]                 
__________________________________________________________________________________________________
mean_layer (Dense)              (None, 7800)         1006200     baselayer2[0][0]                 
__________________________________________________________________________________________________
variance_layer (Lambda)         (None, 26)           0           dense_var_layer[0][0]            
==================================================================================================
Total params: 1,369,772
Trainable params: 1,369,772
Non-trainable params: 0
__________________________________________________________________________________________________

但是,当我尝试运行训练过程时,出现以下错误

ValueError: in user code:

    <ipython-input-70-084e2be19035>:7 train_step  *
        loss = mdn_loss(y, pi_, mu_, var_)
    <ipython-input-67-9a3cf3d4ccd2>:18 mdn_loss  *
        out = calc_pdf(y_true, mu, var)
    <ipython-input-67-9a3cf3d4ccd2>:6 calc_pdf  *
        value = tf.subtract(y, mu)**2
.....
ValueError: Dimensions must be equal, but are 300 and 7800 for '{{node Sub}} = Sub[T=DT_FLOAT](y, model_15/mean_layer/BiasAdd)' with input shapes: [161,300], [161,7800].

它告诉我tf.subtract()中指定的变量的维度在calc_pdf()中使用有问题,

# Take a note how easy it is to write the loss function in 
# new tensorflow eager mode (debugging the function becomes intuitive too)

def calc_pdf(y, mu, var):
    """Calculate component density"""
    value = tf.subtract(y, mu)**2
    value = (1/tf.math.sqrt(2 * np.pi * var)) * tf.math.exp((-1/(2*var)) * value)
    return value


def mdn_loss(y_true, pi, mu, var):
    """MDN Loss Function
    The eager mode in tensorflow 2.0 makes is extremely easy to write 
    functions like these. It feels a lot more pythonic to me.
    """
    out = calc_pdf(y_true, mu, var)
    # multiply with each pi and sum it
    out = tf.multiply(out, pi)
    out = tf.reduce_sum(out, 1, keepdims=True)
    out = -tf.math.log(out + 1e-10)
    return tf.reduce_mean(out)

但我不知道如何解决这个问题。我检查了原始实现(在上面的 link 中),其中包含 4000 个观察值、1 个特征和 26 个分布,这些分布的维度为 [4000, 1]、[4000, 26] 用于特定函数,并且工作正常。我觉得它应该也适用于 [161,300]、[161,7800],但事实并非如此。

我该如何解决这个问题?

(我已经检查过关于“维度必须相等”的类似问题,但无法弄清楚如何针对此特定实现进行这项工作。)

如果还不够,我可以post补充信息或代码,非常感谢您的回答!

对于 MDN 模型,必须使用所有高斯 pdf 计算每个样本的可能性,为此我认为您必须重塑矩阵(y_true 和 mu)并利用广播通过添加 1 作为最后一个维度的操作。例如:

def calc_pdf(y, mu, var):
   
    """Calculate component density"""
   y = tf.reshape(y , (161,300,1))
   mu =  tf.reshape(mu ,(161,300,26))
   value = tf.subtract(y, mu)**2