使用 TensorFlow 概率学习分类变量

Question

我想使用 TFP 编写一个神经网络，其中输出是具有 3 类的分类变量的概率，并使用负对数似然训练它。

在我使用 TF 和 TFP 迈出第一步时，我从一个玩具模型开始，其中输入层只有 1 个单元接收空输入，输出层有 3 个单元具有 softmax 激活函数。这个想法是偏差应该学习（直到加法常数）概率的对数。

下面是我的代码，true_p是我用来生成数据的真实参数，我想学习，而learned_p是我从NN得到的。

import numpy as np
import tensorflow as tf
from tensorflow import keras
from functions import nll

from tensorflow.keras.optimizers import SGD
import tensorflow.keras.layers as layers
import tensorflow_probability as tfp
tfd = tfp.distributions

# params
true_p = np.array([0.1, 0.7, 0.2])
n_train = 1000

# training data
x_train = np.array(np.zeros(n_train)).reshape((n_train,))
y_train = np.array(np.random.choice(len(true_p), size=n_train, p=true_p)).reshape((n_train,))

# model
input_layer = layers.Input(shape=(1,))
p_layer = layers.Dense(len(true_p), activation=tf.nn.softmax)(input_layer)
p_y = tfp.layers.DistributionLambda(tfd.Categorical)(p_layer)

model_p = keras.models.Model(inputs=input_layer, outputs=p_y)
model_p.compile(SGD(), loss=nll)

# training
hist_p = model_p.fit(x=x_train, y=y_train, batch_size=100, epochs=3000, verbose=0)

# check result
learned_p = np.round(model_p.layers[1].call(tf.constant([0], shape=(1, 1))).numpy(), 3)
learned_p

通过这个设置，我得到了结果：

>>> learned_p
array([[0.005, 0.989, 0.006]], dtype=float32)

我高估了第二类，分不清第一类和第三类。最糟糕的是，如果我在每个时期结束时绘制概率，看起来它们单调收敛到向量 [0,1,0]，这没有意义（在我看来梯度应该推入一旦我开始高估，方向就会相反）。

我真的搞不懂这是怎么回事，但感觉我做错了什么。任何的想法？感谢您的帮助！

郑重声明，我也尝试过使用其他优化器，例如 Adam 或 Adagrad 来处理超参数，但没有成功。

我正在使用 Python 3.7.9、TensorFlow 2.3.1 和 TensorFlow 概率 0.11.1

Answer 1

我相信 Categorical 的默认参数不是概率向量，而是 logits 向量（您将采用 softmax 来获得概率的值）。这是为了帮助保持内部分类计算的精度，如 log_prob。我认为您可以简单地消除 softmax 激活函数，它应该起作用。如果没有请更新！

编辑：或者您可以将 tfd.Categorical 替换为

lambda p: tfd.Categorical(probs=p)

但是您将失去上述精度增益。只是想澄清传递 probs 是一个选项，而不是默认值。

使用 TensorFlow 概率学习分类变量

Learning a Categorical Variable with TensorFlow Probability

tensorflow-probability

tensorflow2.0