TensorFlow Probability 的贝叶斯层的 属性 损失代表什么?

What does the property losses of the Bayesian layers of TensorFlow Probability represent?

我是运行Bayesian Neural Network implemented using Tensorflow Probability上的示例代码。

我的问题是关于用于变分推理的 ELBO 损失的实现。 ELBO 等于代码中实现的两项的总和,即 neg_log_likelihoodkl。我很难理解 kl 术语的实现。

模型的定义如下:

with tf.name_scope("bayesian_neural_net", values=[images]):
  neural_net = tf.keras.Sequential()
  for units in FLAGS.layer_sizes:
    layer = tfp.layers.DenseFlipout(units, activation=FLAGS.activation)
    neural_net.add(layer)
  neural_net.add(tfp.layers.DenseFlipout(10))
  logits = neural_net(images)
  labels_distribution = tfd.Categorical(logits=logits)

'kl' 术语的定义如下:

kl = sum(neural_net.losses) / mnist_data.train.num_examples

我不确定 neural_net.losses 在这里返回什么,因为没有为 neural_net 定义损失函数。很明显,neural_net.losses会返回一些值,但不知道返回值是什么意思。对此有何评论?

我猜是 L2 范数,但我不确定。如果真是这样,我们仍然缺少一些东西。根据 VAE 论文附录 B,作者在先验为标准正态时导出了 KL 项。事实证明它非常接近变分参数的 L2 范数,除了有额外的对数方差项和常数项。对此有何评论?

TensorFlow Keras Layer represents side-effect computation such as regularizer penalties. Unlike regularizer penalties on specific TensorFlow variables, here, the losses represent the KL divergence computation. Check out the implementation here as well as the docstring's examplelosses属性:

We illustrate a Bayesian neural network with variational inference, assuming a dataset of features and labels.

  import tensorflow_probability as tfp
  model = tf.keras.Sequential([
      tfp.layers.DenseFlipout(512, activation=tf.nn.relu),
      tfp.layers.DenseFlipout(10),
  ])
  logits = model(features)
  neg_log_likelihood = tf.nn.softmax_cross_entropy_with_logits(
      labels=labels, logits=logits)
  kl = sum(model.losses)
  loss = neg_log_likelihood + kl
  train_op = tf.train.AdamOptimizer().minimize(loss)

It uses the Flipout gradient estimator to minimize the Kullback-Leibler divergence up to a constant, also known as the negative Evidence Lower Bound. It consists of the sum of two terms: the expected negative log-likelihood, which we approximate via Monte Carlo; and the KL divergence, which is added via regularizer terms which are arguments to the layer.