log_softmax() 如何实现以更快的速度和数值稳定性计算其值（和梯度）？

Question

MXNet和PyTorch都提供了计算log(softmax())的特殊实现，速度更快，数值更稳定。但是，我在这两个包中都找不到此函数 log_softmax() 的实际 Python 实现。

任何人都可以解释这是如何实现的，或者更好的是，请指出相关的源代码？

Answer 1

您可以找到 CPU 实现之一 here and a vectorized version here (this is the log version, called from vec_host_softmax_lastdim)。

您可以找到一个 CUDA 实现 here, which then calls softmax_warp_forward。

它们都很相似，只是语法不同。如您所见，通常有一个标志定义是否使用对数计算 softmax。即，LogSoftMax 而不是 SoftMax。

Answer 2

数值错误：

>>> x = np.array([1, -10, 1000])
>>> np.exp(x) / np.exp(x).sum()
RuntimeWarning: overflow encountered in exp
RuntimeWarning: invalid value encountered in true_divide
Out[4]: array([ 0.,  0., nan])

有两种方法可以在计算softmax时避免数值误差：

Exp 归一化：

def exp_normalize(x):
    b = x.max()
    y = np.exp(x - b)
    return y / y.sum()

>>> exp_normalize(x)
array([0., 0., 1.])

对数总和指数

def log_softmax(x):
    c = x.max()
    logsumexp = np.log(np.exp(x - c).sum())
    return x - c - logsumexp

请注意，上式中b、c的合理选择是max(x)。有了这个选择，就不可能因 exp 而溢出。移位后取幂的最大数为0.

log_softmax() 如何实现以更快的速度和数值稳定性计算其值（和梯度）？

How is log_softmax() implemented to compute its value (and gradient) with better speed and numerical stability?

python

machine-learning

numerical-methods

mxnet

pytorch