PyTorch 中的交叉熵

Cross Entropy in PyTorch

交叉熵公式:

但为什么下面给出 loss = 0.7437 而不是 loss = 0(因为 1*log(1) = 0)?

import torch
import torch.nn as nn
from torch.autograd import Variable

output = Variable(torch.FloatTensor([0,0,0,1])).view(1, -1)
target = Variable(torch.LongTensor([3]))

criterion = nn.CrossEntropyLoss()
loss = criterion(output, target)
print(loss)

您的理解是正确的,但 pytorch 不会以这种方式计算 cross entropy。 Pytorch 使用以下公式。

loss(x, class) = -log(exp(x[class]) / (\sum_j exp(x[j])))
               = -x[class] + log(\sum_j exp(x[j]))

因为在您的场景中,x = [0, 0, 0, 1]class = 3,如果您计算上述表达式,您将得到:

loss(x, class) = -1 + log(exp(0) + exp(0) + exp(0) + exp(1))
               = 0.7437

Pytorch 考虑自然对数。

在您的示例中,您将输出 [0, 0, 0, 1] 视为交叉熵数学定义所要求的概率。但是 PyTorch 将它们视为输出,不需要求和为 1,并且需要首先转换为它使用 softmax 函数的概率。

所以H(p, q)变成:

H(p, softmax(output))

将输出 [0, 0, 0, 1] 转换为概率:

softmax([0, 0, 0, 1]) = [0.1749, 0.1749, 0.1749, 0.4754]

来源:

-log(0.4754) = 0.7437

我想添加一条重要说明,因为这通常会导致混淆。

Softmax 不是损失函数,也不是真正的激活函数。它有一个非常具体的任务:它用于 multi-class classification 以标准化给定 classes 的分数。通过这样做,我们得到每个 class 的概率总和为 1.

Softmax结合Cross-Entropy-Loss计算一个模型的损失

不幸的是,因为这种组合很常见,所以经常被缩写。有些人使用术语 Softmax-Loss,而 PyTorch 只称其为 Cross-Entropy-Loss.

这里我给出了手动计算pytorch的CrossEntropyLoss的完整公式。稍后您会看到一些精度问题;如果您知道确切原因,请post回答。

首先,了解 NLLLoss 的工作原理。那么CrossEntropyLoss就很相似了,只不过是NLLLoss里面有Softmax。

import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np

def compute_nllloss_manual(x,y0):
    """
    x is the vector with shape (batch_size,C) 
    Note: official example uses log softmax(some vector) as x, so it becomes CELoss.
    y0 shape is the same (batch_size), whose entries are integers from 0 to C-1
    Furthermore, for C>1 classes, the other classes are ignored (see below

    """
    loss = 0.
    n_batch, n_class = x.shape
    # print(n_class)
    for x1,y1 in zip(x,y0):
        class_index = int(y1.item())
        loss = loss + x1[class_index] # other class terms, ignore.
    loss = - loss/n_batch
    return loss

我们从公式中看出它不像标准规定的 NLLLoss,因为“其他 class”项被忽略了(请参阅代码中的注释)。另外,请记住 Pytorch 经常批量处理事物。在下面的代码中,我们随机发起1000个批次来验证公式是否正确到小数点后15位。

torch.manual_seed(0)
precision = 15

batch_size=10
C = 10

N_iter = 1000
n_correct_nll = 0

criterion = nn.NLLLoss()
for i in range(N_iter):
    x = torch.rand(size=(batch_size,C)).to(torch.float)
    y0 = torch.randint(0,C,size=(batch_size,))

    nll_loss = criterion(x,y0)
    manual_nll_loss = compute_nllloss_manual(x,y0)
    if i==0:
        print('NLLLoss:')
        print('module:%s'%(str(nll_loss)))
        print('manual:%s'%(str(manual_nll_loss)))

    nll_loss_check = np.abs((nll_loss- manual_nll_loss).item())<10**-precision
    if nll_loss_check: n_correct_nll+=1

print('percentage NLLLoss correctly computed:%s'%(str(n_correct_nll/N_iter*100)))

我得到如下输出:

NLLLoss:
module:tensor(-0.4783)
manual:tensor(-0.4783)
percentage NLLLoss correctly computed:100.0

到目前为止一切顺利,100% 的计算都是正确的。现在让我们使用以下方法手动计算 CrossEntropyLoss。

def compute_crossentropyloss_manual(x,y0):
    """
    x is the vector with shape (batch_size,C)
    y0 shape is the same (batch_size), whose entries are integers from 0 to C-1
    """
    loss = 0.
    n_batch, n_class = x.shape
    # print(n_class)
    for x1,y1 in zip(x,y0):
        class_index = int(y1.item())
        loss = loss + torch.log(torch.exp(x1[class_index])/(torch.exp(x1).sum()))
    loss = - loss/n_batch
    return loss

然后对 1000 个随机启动的批次重复该过程。

torch.manual_seed(0)
precision = 15

batch_size=10
C = 10

N_iter = 1000
n_correct_CE = 0

criterion2 = nn.CrossEntropyLoss()
for i in range(N_iter):
    x = torch.rand(size=(batch_size,C)).to(torch.float)
    y0 = torch.randint(0,C,size=(batch_size,))

    CEloss = criterion2(x,y0)
    manual_CEloss = compute_crossentropyloss_manual(x,y0)
    if i==0:
        print('CrossEntropyLoss:')
        print('module:%s'%(str(CEloss)))
        print('manual:%s'%(str(manual_CEloss)))

    CE_loss_check = np.abs((CEloss- manual_CEloss).item())<10**-precision
    if CE_loss_check: n_correct_CE+=1

print('percentage CELoss correctly computed :%s'%(str(n_correct_CE/N_iter*100)))

结果是

CrossEntropyLoss:
module:tensor(2.3528)
manual:tensor(2.3528)
percentage CELoss correctly computed :81.39999999999999

我有 81.4% 的计算正确到小数点后 15 位。很可能 exp() 和 log() 给出了一些精度问题,但我不知道具体是怎么回事。

The combination of nn.LogSoftmax and nn.NLLLoss is equivalent to using nn.CrossEntropyLoss. This terminology is a particularity of PyTorch, as the nn.NLLoss [sic] computes, in fact, the cross entropy but with log probability predictions as inputs where nn.CrossEntropyLoss takes scores (sometimes called logits). Technically, nn.NLLLoss is the cross entropy between the Dirac distribution, putting all mass on the target, and the predicted distribution given by the log probability inputs.

PyTorch 的 CrossEntropyLoss 期望无界分数(可解释为 logits / log-odds)作为输入,而不是概率(传统上定义的 CE)。