如何计算序列的交叉熵损失

Question

我有一个序列 continuation/prediction 任务（输入：一个 class 索引序列，输出：一个 class 索引序列）并且我使用 Pytorch。

我的神经网络 returns 一个形状为 (batch_size、sequence_length、numb_classes) 的张量，其中条目是与 [=具有此索引的 26=] 是序列中的下一个 class。我在训练数据中的目标是形状 (batch_size, sequence_length)（只是真实预测的序列）。

我想使用交叉熵损失

我的问题：如何使用交叉熵损失函数？需要哪些输入形状？

谢谢！

Answer 1

nn.CrossEntropyLoss 的文档页面明确指出：

Input: shape (C), (N, C) or (N, C, d_1, d_2, ..., d_K) with K >= 1 in the case of K-dimensional loss.
Target: If containing class indices, shape (), (N) or (N, d_1, d_2, ..., d_K) with K >= 1 in the case of K-dimensional loss where each value should be between [0, C). If containing class probabilities, the input and each value should be between [0, 1].

为了crystal清楚，“输入”指的是模型的输出预测而 "target" 是标签张量。简而言之，目标的维度必须比输入的维度少一维。目标中缺少的维度将包含每个 class logit 值。通常，我们说目标是dense格式，它只包含与真实标签对应的class个索引。

您给出的示例对应于以下用例：

#input = (batch_size, sequence_length, numb_classes)
#target = (batch_size, sequence_length)

这是 #input = (N, C, d_1) 和 #target = (N, d_1) 的情况，即；，您需要排列坐标轴，或者从您的 [= 中调换两个坐标轴26=] 张量，使其形状为 (batch_size, numb_classes, sequence_length)，即 (N, C, d_1)。您可以使用 torch.Tensor.transpose or torch.Tensor.permute:

>>> input.permute(0,2,1)

或

>>> input.transpose(1,2)

如何计算序列的交叉熵损失

How to compute Cross Entropy Loss for sequences

python

neural-network

pytorch