从 BCEWithLogitLoss（二元交叉熵 + Sigmoid 激活）计算损失后使用 Softmax 激活函数

Question

我正在学习使用 PyTorch 的二元分类教程，这里，网络的最后一层是 torch.Linear()，只有一个神经元。（有道理）这会给我们一个神经元。作为 pred=network(input_batch)

之后损失函数的选择是loss_fn=BCEWithLogitsLoss()（这比先使用softmax然后计算损失在数值上是稳定的）它将对最后一层的输出应用Softmax函数来给出我们一个概率。所以在那之后，它会计算二元交叉熵来最小化损失。

loss=loss_fn(pred,true)

我担心的是，在这之后，作者使用了torch.round(torch.sigmoid(pred))

为什么会这样？我的意思是我知道它将获得 [0,1] 范围内的预测概率，然后使用默认阈值 0.5.

对值进行舍入

在网络的最后一层之后使用一次 sigmoid 而不是在 2 个不同的地方使用 softmax 和 sigmoid 是不是更好？

只

不是更好吗

out = self.linear(batch_tensor)
return self.sigmoid(out)

然后计算 BCE 损失并使用 argmax() 检查准确性??

我很好奇这是否是一个有效的策略？

Answer 1

您似乎将二进制 class 化视为具有两个 class 的多 class class 化，但是当使用二元交叉熵方法。在查看任何实现细节之前，让我们首先阐明二进制 classification 的目标。

从技术上讲，有两个 classes，0 和 1，但与其将它们视为两个独立的 classes，不如将它们视为彼此相反的东西。例如，您想 class 确定 Whosebug 的回答是否有帮助。两个 class 将是 "helpful" 和 "not helpful"。自然地，你会简单地问 "Was the answer helpful?"，消极的方面被忽略了，如果不是这样，你可以推断它是 "not helpful"。（记住，这是二元情况，没有中间立场）。

因此，你的模型只需要预测一个class，但为了避免与实际的两个class混淆，可以表示为：情况发生。在上一个示例的上下文中：Whosebug 答案有帮助的概率是多少？

Sigmoid 为您提供 [0, 1] 范围内的值，这些值是概率。现在您需要通过定义阈值来确定模型何时有足够的信心使其为正。为了平衡，阈值为0.5，所以只要概率大于0.5就是正的（class1:"helpful"）否则就是负的（class 0："not helpful"），这是通过四舍五入实现的（即torch.round(torch.sigmoid(pred))）。

After that the choice of Loss function is loss_fn=BCEWithLogitsLoss() (which is numerically stable than using the softmax first and then calculating loss) which will apply Softmax function to the output of last layer to give us a probability.

Isn't it better to use the sigmoid once after the last layer within the network rather using a softmax and a sigmoid at 2 different places given it's a binary classification??

BCEWithLogitsLoss 应用 Sigmoid 而不是 Softmax，根本不涉及 Softmax。来自 nn.BCEWithLogitsLoss documentation:

This loss combines a Sigmoid layer and the BCELoss in one single class. This version is more numerically stable than using a plain Sigmoid followed by a BCELoss as, by combining the operations into one layer, we take advantage of the log-sum-exp trick for numerical stability.

通过在模型中不应用 Sigmoid，您可以获得数值更稳定的二元交叉熵版本，但这意味着如果您想在训练之外进行实际预测，则必须手动应用 Sigmoid。

[...] and use the argmax() for checking accuracy??

同样，您正在考虑多 class 场景。您只有一个输出 class，即输出大小为 [batch_size，1]。取其中的 argmax，总是会给你 0，因为那是唯一可用的 class.

从 BCEWithLogitLoss（二元交叉熵 + Sigmoid 激活）计算损失后使用 Softmax 激活函数

Using Softmax Activation function after calculating loss from BCEWithLogitLoss (Binary Cross Entropy + Sigmoid activation)

neural-network

deep-learning

recurrent-neural-network

pytorch