具有自定义损失函数的 TensorFlow 2 的无效参数错误,尽管一切似乎都是正确的

Invalid argument error with TensorFlow 2 with self-defined loss function, although everything seems to be correct

我目前正在使用 TensorFlow 2 训练模型,这些模型不仅可以提供时间序列的点预测,还可以预测分布指标(例如均值和方差)。 为此,我创建了一个层并修改了损失函数以优化相应的参数。 对于只有一个预测时间序列的一维情况,这很有效。

对于具有两个时间序列的情况,我想尝试相应地预测相关性并使用“tensorflow_probability”中的函数“MultivariateNormalFullCovariance”。但是有了这个我得到以下错误:

InvalidArgumentError:  Input matrix must be square.
     [[node negative_normdist_loss_2/MultivariateNormalFullCovariance/init/Cholesky (defined at d:_programming\python\virtualenvs\tensorflow-gpu-2\lib\site-packages\tensorflow_probability\python\distributions\mvn_full_covariance.py:194) ]] [Op:__inference_train_function_1133]

Errors may have originated from an input operation.
Input Source operations connected to node negative_normdist_loss_2/MultivariateNormalFullCovariance/init/Cholesky:
 negative_normdist_loss_2/MultivariateNormalFullCovariance/init/covariance_matrix (defined at d:_programming\python\virtualenvs\tensorflow-gpu-2\lib\site-packages\tensorflow_probability\python\distributions\mvn_full_covariance.py:181)

Function call stack:
train_function

我知道输入的尺寸有问题,但遗憾的是我没能找到具体的错误。 (相关矩阵已经是二次的,即使它两次包含相同的参数。)

代码本身有点复杂。因此,我已将包含示例数据的工作(单变量)和非工作示例(多变量)上传到此目录:

https://drive.google.com/drive/folders/1IIAtKDB8paWV0aFVFALDUAiZTCqa5fAN?usp=sharing

为了更好的了解,我还复制了下面的基本例程:

def negative_normdist_layer_2(x):
    # Get the number of dimensions of the input
    num_dims = len(x.get_shape())
    # Separate the parameters
    mu1, mu2, sigma11, sigma12, sigma22 = tf.unstack(x, num=5, axis=-1)
    # Add one dimension to make the right shape
    mu1 = tf.expand_dims(mu1, -1)
    mu2 = tf.expand_dims(mu2, -1)
    sigma11 = tf.expand_dims(sigma11, -1)
    sigma12 = tf.expand_dims(sigma12, -1)
    sigma22 = tf.expand_dims(sigma22, -1)
    # Apply a softplus to make positive
    sigma11 = tf.keras.activations.softplus(sigma11)
    sigma22 = tf.keras.activations.softplus(sigma22)
    # Join back together again
    out_tensor = tf.concat((mu1, mu2, sigma11, sigma12, sigma22), axis=num_dims-1)
    return out_tensor

def negative_normdist_loss_2(y_true, y_pred):
    # Separate the parameters
    mu1, mu2, sigma11, sigma12, sigma22 = tf.unstack(y_pred, num=5, axis=-1)
    # Add one dimension to make the right shape
    mu1 = tf.expand_dims(mu1, -1)
    mu2 = tf.expand_dims(mu2, -1)
    sigma11 = tf.expand_dims(sigma11, -1)
    sigma12 = tf.expand_dims(sigma12, -1)
    sigma22 = tf.expand_dims(sigma22, -1)
    # Calculate the negative log likelihood
    dist = tfp.distributions.MultivariateNormalFullCovariance(
        loc = [mu1, mu2], 
        covariance_matrix = [[sigma11, sigma12], [sigma12, sigma22]]
    )
    nll = tf.reduce_mean(-dist.log_prob(y_true))
    return nll

# Define inputs with predefined shape
input_shape = lookback // step, float_data.shape[-1]
inputs = Input(shape=input_shape)

# Build network with some predefined architecture
output1 = Flatten()(inputs)
output2 = Dense(32)(output1)

# Predict the parameters of a negative normdist distribution
outputs = Dense(5)(output2)
distribution_outputs = Lambda(negative_normdist_layer_2)(outputs)

# Construct model
model_norm_2 = Model(inputs=inputs, outputs=distribution_outputs)

opt = Adam()
model_norm_2.compile(loss = negative_normdist_loss_2, optimizer = opt)

history_norm_2 = model_norm_2.fit_generator(train_gen_mult,
                                            steps_per_epoch=500,
                                            epochs=20,
                                            validation_data=val_gen_mult,
                                            validation_steps=val_steps)

我使用的操作系统是Windows10,Python版本是3.6。示例代码中列出的所有库都是最新的,包括tensorflow-gpu。

如果能确定错误的确切原因并找到解决方案,我将不胜感激。

必须转置均值和协方差参数,因为它们的大小应该是 (batch_size, 2) 和 (batch_size, 2, 2)(对于维度 2 的问题) 根据 MultivariateNormalFullCovariance.There were problems with the inversion of the covariance matrix despite the layer to make sure that the diagonal terms are positive. You can use MultivariateNormalTriL 的文档,它取而代之的是下三角矩阵,协方差求逆不再有问题(保留 softplus):

def negative_normdist_loss_2(y_true, y_pred):
    # Separate the parameters
    mu1, mu2, sigma11, sigma12, sigma22 = tf.unstack(y_pred, num=5, axis=-1)
    mu = tf.transpose([mu1, mu2], perm=[1, 0])
    sigma_tril = tf.transpose([[sigma11, tf.zeros_like(sigma11)], [sigma12, sigma22]], perm=[2, 0, 1])
    dist = tfp.distributions.MultivariateNormalTriL(loc=mu, scale_tril=sigma_tril)
    nll = tf.reduce_mean(-dist.log_prob(y_true))
    return nll

但是,我想知道它背后的想法。它对应于一种无监督的方法,即 interesting.The 数据允许您为有点非常规的成本函数估计均值和协方差参数,但不清楚之后您可以用它做什么。