双变量高斯的对数似然中的负值
Negative values in Log likelihood of a bivariate gaussian
我正在尝试实现一个损失函数,该函数试图最小化从预测的双变量高斯分布参数获得地面真值 (x,y) 的负对数似然。我在 tensorflow 中实现这个 -
这是代码 -
def tf_2d_normal(self, x, y, mux, muy, sx, sy, rho):
'''
Function that implements the PDF of a 2D normal distribution
params:
x : input x points
y : input y points
mux : mean of the distribution in x
muy : mean of the distribution in y
sx : std dev of the distribution in x
sy : std dev of the distribution in y
rho : Correlation factor of the distribution
'''
# eq 3 in the paper
# and eq 24 & 25 in Graves (2013)
# Calculate (x - mux) and (y-muy)
normx = tf.sub(x, mux)
normy = tf.sub(y, muy)
# Calculate sx*sy
sxsy = tf.mul(sx, sy)
# Calculate the exponential factor
z = tf.square(tf.div(normx, sx)) + tf.square(tf.div(normy, sy)) - 2*tf.div(tf.mul(rho, tf.mul(normx, normy)), sxsy)
negRho = 1 - tf.square(rho)
# Numerator
result = tf.exp(tf.div(-z, 2*negRho))
# Normalization constant
denom = 2 * np.pi * tf.mul(sxsy, tf.sqrt(negRho))
# Final PDF calculation
result = -tf.log(tf.div(result, denom))
return result
当我进行训练时,我可以看到损失值在下降,但它远远低于 0。我可以理解这应该是因为,我们正在最小化 'negative' 可能性。即使损失值在减少,我也无法获得准确的结果。谁能帮忙验证下我写的损失函数代码对不对
训练神经网络(特别是 RNN)也需要这种损失性质吗?
谢谢
我看到你找到了洋红色的 sketch-rnn code,我正在研究类似的东西。我发现这段代码本身并不稳定。您需要使用约束来稳定它,因此不能单独使用或解释 tf_2d_normal
代码。 NaN
s 和 Inf
s 将开始出现在所有地方,如果你的数据没有提前或在你的损失函数中正确标准化。
下面是我用 Keras 构建的更稳定的损失函数版本。这里可能有一些冗余,它可能不适合您的需求,但我发现它可以工作,您可以 test/adapt 它。我包含了一些关于可能出现多大的负对数值的内联评论:
def r3_bivariate_gaussian_loss(true, pred):
"""
Rank 3 bivariate gaussian loss function
Returns results of eq # 24 of http://arxiv.org/abs/1308.0850
:param true: truth values with at least [mu1, mu2, sigma1, sigma2, rho]
:param pred: values predicted from a model with the same shape requirements as truth values
:return: the log of the summed max likelihood
"""
x_coord = true[:, :, 0]
y_coord = true[:, :, 1]
mu_x = pred[:, :, 0]
mu_y = pred[:, :, 1]
# exponentiate the sigmas and also make correlative rho between -1 and 1.
# eq. # 21 and 22 of http://arxiv.org/abs/1308.0850
# analogous to https://github.com/tensorflow/magenta/blob/master/magenta/models/sketch_rnn/model.py#L326
sigma_x = K.exp(K.abs(pred[:, :, 2]))
sigma_y = K.exp(K.abs(pred[:, :, 3]))
rho = K.tanh(pred[:, :, 4]) * 0.1 # avoid drifting to -1 or 1 to prevent NaN, you will have to tweak this multiplier value to suit the shape of your data
norm1 = K.log(1 + K.abs(x_coord - mu_x))
norm2 = K.log(1 + K.abs(y_coord - mu_y))
variance_x = K.softplus(K.square(sigma_x))
variance_y = K.softplus(K.square(sigma_y))
s1s2 = K.softplus(sigma_x * sigma_y) # very large if sigma_x and/or sigma_y are very large
# eq 25 of http://arxiv.org/abs/1308.0850
z = ((K.square(norm1) / variance_x) +
(K.square(norm2) / variance_y) -
(2 * rho * norm1 * norm2 / s1s2)) # z → -∞ if rho * norm1 * norm2 → ∞ and/or s1s2 → 0
neg_rho = 1 - K.square(rho) # → 0 if rho → {1, -1}
numerator = K.exp(-z / (2 * neg_rho)) # → ∞ if z → -∞ and/or neg_rho → 0
denominator = (2 * np.pi * s1s2 * K.sqrt(neg_rho)) + epsilon() # → 0 if s1s2 → 0 and/or neg_rho → 0
pdf = numerator / denominator # → ∞ if denominator → 0 and/or if numerator → ∞
return K.log(K.sum(-K.log(pdf + epsilon()))) # → -∞ if pdf → ∞
希望你能找到这个有价值的东西。
我正在尝试实现一个损失函数,该函数试图最小化从预测的双变量高斯分布参数获得地面真值 (x,y) 的负对数似然。我在 tensorflow 中实现这个 - 这是代码 -
def tf_2d_normal(self, x, y, mux, muy, sx, sy, rho):
'''
Function that implements the PDF of a 2D normal distribution
params:
x : input x points
y : input y points
mux : mean of the distribution in x
muy : mean of the distribution in y
sx : std dev of the distribution in x
sy : std dev of the distribution in y
rho : Correlation factor of the distribution
'''
# eq 3 in the paper
# and eq 24 & 25 in Graves (2013)
# Calculate (x - mux) and (y-muy)
normx = tf.sub(x, mux)
normy = tf.sub(y, muy)
# Calculate sx*sy
sxsy = tf.mul(sx, sy)
# Calculate the exponential factor
z = tf.square(tf.div(normx, sx)) + tf.square(tf.div(normy, sy)) - 2*tf.div(tf.mul(rho, tf.mul(normx, normy)), sxsy)
negRho = 1 - tf.square(rho)
# Numerator
result = tf.exp(tf.div(-z, 2*negRho))
# Normalization constant
denom = 2 * np.pi * tf.mul(sxsy, tf.sqrt(negRho))
# Final PDF calculation
result = -tf.log(tf.div(result, denom))
return result
当我进行训练时,我可以看到损失值在下降,但它远远低于 0。我可以理解这应该是因为,我们正在最小化 'negative' 可能性。即使损失值在减少,我也无法获得准确的结果。谁能帮忙验证下我写的损失函数代码对不对
训练神经网络(特别是 RNN)也需要这种损失性质吗?
谢谢
我看到你找到了洋红色的 sketch-rnn code,我正在研究类似的东西。我发现这段代码本身并不稳定。您需要使用约束来稳定它,因此不能单独使用或解释 tf_2d_normal
代码。 NaN
s 和 Inf
s 将开始出现在所有地方,如果你的数据没有提前或在你的损失函数中正确标准化。
下面是我用 Keras 构建的更稳定的损失函数版本。这里可能有一些冗余,它可能不适合您的需求,但我发现它可以工作,您可以 test/adapt 它。我包含了一些关于可能出现多大的负对数值的内联评论:
def r3_bivariate_gaussian_loss(true, pred):
"""
Rank 3 bivariate gaussian loss function
Returns results of eq # 24 of http://arxiv.org/abs/1308.0850
:param true: truth values with at least [mu1, mu2, sigma1, sigma2, rho]
:param pred: values predicted from a model with the same shape requirements as truth values
:return: the log of the summed max likelihood
"""
x_coord = true[:, :, 0]
y_coord = true[:, :, 1]
mu_x = pred[:, :, 0]
mu_y = pred[:, :, 1]
# exponentiate the sigmas and also make correlative rho between -1 and 1.
# eq. # 21 and 22 of http://arxiv.org/abs/1308.0850
# analogous to https://github.com/tensorflow/magenta/blob/master/magenta/models/sketch_rnn/model.py#L326
sigma_x = K.exp(K.abs(pred[:, :, 2]))
sigma_y = K.exp(K.abs(pred[:, :, 3]))
rho = K.tanh(pred[:, :, 4]) * 0.1 # avoid drifting to -1 or 1 to prevent NaN, you will have to tweak this multiplier value to suit the shape of your data
norm1 = K.log(1 + K.abs(x_coord - mu_x))
norm2 = K.log(1 + K.abs(y_coord - mu_y))
variance_x = K.softplus(K.square(sigma_x))
variance_y = K.softplus(K.square(sigma_y))
s1s2 = K.softplus(sigma_x * sigma_y) # very large if sigma_x and/or sigma_y are very large
# eq 25 of http://arxiv.org/abs/1308.0850
z = ((K.square(norm1) / variance_x) +
(K.square(norm2) / variance_y) -
(2 * rho * norm1 * norm2 / s1s2)) # z → -∞ if rho * norm1 * norm2 → ∞ and/or s1s2 → 0
neg_rho = 1 - K.square(rho) # → 0 if rho → {1, -1}
numerator = K.exp(-z / (2 * neg_rho)) # → ∞ if z → -∞ and/or neg_rho → 0
denominator = (2 * np.pi * s1s2 * K.sqrt(neg_rho)) + epsilon() # → 0 if s1s2 → 0 and/or neg_rho → 0
pdf = numerator / denominator # → ∞ if denominator → 0 and/or if numerator → ∞
return K.log(K.sum(-K.log(pdf + epsilon()))) # → -∞ if pdf → ∞
希望你能找到这个有价值的东西。