无法从 DenseVariational 获得合理的结果

Question

我正在尝试使用以下大小为 500

的数据集（正弦曲线）解决回归问题

首先，我尝试使用 2 个密集层，每个层有 10 个单元

model = tf.keras.Sequential([
        tf.keras.layers.Dense(10, activation='tanh'),
        tf.keras.layers.Dense(10, activation='tanh'),
        tf.keras.layers.Dense(1),
        tfp.layers.DistributionLambda(lambda t: tfd.Normal(loc=t, scale=1.))
    ])

使用负对数似然损失训练如下

model.compile(optimizer=tf.optimizers.Adam(learning_rate=0.01), loss=neg_log_likelihood)
model.fit(x, y, epochs=50)

结果图

接下来，我尝试了与 DenseVariational

类似的环境

model = tf.keras.Sequential([
        tfp.layers.DenseVariational(
            10, activation='tanh', make_posterior_fn=posterior,
            make_prior_fn=prior, kl_weight=1/N, kl_use_exact=True),
        tfp.layers.DenseVariational(
            10, activation='tanh', make_posterior_fn=posterior,
            make_prior_fn=prior, kl_weight=1/N, kl_use_exact=True),
        tfp.layers.DenseVariational(
            1, activation='tanh', make_posterior_fn=posterior,
            make_prior_fn=prior, kl_weight=1/N, kl_use_exact=True),
        tfp.layers.DistributionLambda(lambda t: tfd.Normal(loc=t, scale=1.))
    ])

由于参数的数量大约翻了一番，我尝试将数据集大小 and/or epoch 大小增加到 100 倍，但没有成功。结果通常如下。

我的问题是如何获得与 Dense 层与 DenseVariational 层相当的结果？我还读到它可能对初始值敏感。 Here 是完整代码的 link。欢迎提出任何建议。

Answer 1

您需要定义不同的代理后验。在 Tensorflow 的贝叶斯线性回归示例中 https://colab.research.google.com/github/tensorflow/probability/blob/master/tensorflow_probability/examples/jupyter_notebooks/Probabilistic_Layers_Regression.ipynb#scrollTo=VwzbWw3_CQ2z

你有这样的后验平均场

# Specify the surrogate posterior over `keras.layers.Dense` `kernel` and `bias`.
def posterior_mean_field(kernel_size, bias_size=0, dtype=None):
  n = kernel_size + bias_size
  c = np.log(np.expm1(1.))
  return tf.keras.Sequential([
      tfp.layers.VariableLayer(2 * n, dtype=dtype),
      tfp.layers.DistributionLambda(lambda t: tfd.Independent(
          tfd.Normal(loc=t[..., :n],
                     scale=1e-5 + 0.01*tf.nn.softplus(c + t[..., n:])),
          reinterpreted_batch_ndims=1)),
  ])

但请注意，我在 Softplus 前面加上了 0.01，以减小标准偏差的大小。试试这个。

比这更好的是使用像 DenseFlipout 中默认使用的采样初始化 https://www.tensorflow.org/probability/api_docs/python/tfp/layers/DenseFlipout?version=nightly

这是相同的初始化程序，但已准备好用于 DenseVariational：

def random_gaussian_initializer(shape, dtype):
    n = int(shape / 2)
    loc_norm = tf.random_normal_initializer(mean=0., stddev=0.1)
    loc = tf.Variable(
        initial_value=loc_norm(shape=(n,), dtype=dtype)
    )
    scale_norm = tf.random_normal_initializer(mean=-3., stddev=0.1)
    scale = tf.Variable(
        initial_value=scale_norm(shape=(n,), dtype=dtype)
    )
    return tf.concat([loc, scale], 0)

现在只需将后验平均场中的 VariableLayer 更改为

tfp.layers.VariableLayer(2 * n, dtype=dtype, initializer=lambda shape, dtype: random_gaussian_initializer(shape, dtype), trainable=True)

您现在正在从均值 -3 和标准偏差 0.1 的正态分布中抽样，以输入您的 softplus。使用后验平均场的均值 scale=Softplus(-3) = 0,048587352，所以它非常小。通过采样，我们将以不同方式初始化所有尺度，但围绕该平均值。

Answer 2

关注@Perd 的回答。我在后验上尝试了较低的标准偏差。

对于此数据和 NN 架构，tanh 激活后，我无法获得更好的结果。但是，我能够通过 relu 激活和 scale=1e-5 + 0.001 * tf.nn.softplus(c + t[..., n:]))

获得最佳结果

该模型似乎对超参数非常敏感。以下是不同后验 scale 值

的结果

对于scale=1e-5 + 0.01 * tf.nn.softplus(c + t[..., n:]))

对于scale=1e-5 + 0.005 * tf.nn.softplus(c + t[..., n:]))

对于scale=1e-5 + 0.002 * tf.nn.softplus(c + t[..., n:]))

对于scale=1e-5 + 0.0015 * tf.nn.softplus(c + t[..., n:]))

对于scale=1e-5 + 0.001 * tf.nn.softplus(c + t[..., n:]))

为了tanh激活，还是没能得到好的效果

Code Link

Answer 3

我一直在为同样的问题而苦苦挣扎，我花了一段时间才意识到原因。

Dense-NN 的最后一层没有激活函数 (tf.keras.layers.Dense(1))，而 Variational-NN 的最后一层有 tanh 作为激活函数 (tfp.layers.DenseVariational( 1、激活='tanh' ...)。删除它应该可以解决问题。我还观察到 relu，尤其是 leaky-relu 在此设置中优于 tanh。

无法从 DenseVariational 获得合理的结果

Not able to get reasonable results from DenseVariational

python

machine-learning

keras

tensorflow

tensorflow-probability