火炬 nn.CrossEntropyLoss() 总是 returns 0

Question

我正在构建一个多class Vision Transformer 网络。通过我的损失函数传递我的值时，它总是 returns 零。我的输出层由 37 个密集层组成，每个层上都有一个 softmax 单元。 criterion 是用 nn.CrossEntropyLoss() 创建的。每次迭代的 criterion 输出都是 0.0。我正在使用 colab 笔记本。我打印了一次迭代的输出和标签：

for output, label in zip(iter(ouputs_t), iter(labels_t)):
                      loss += criterion(
                          output,
                          # reshape label from (Batch_Size) to (Batch_Size, 1)
                          torch.reshape(label, (label.shape[0] , 1 ))
                          )

output: tensor([[0.1534],
        [0.5797],
        [0.6554],
        [0.4066],
        [0.2683],
        [0.1773],
        [0.7410],
        [0.5136],
        [0.5695],
        [0.3970],
        [0.4317],
        [0.7216],
        [0.8336],
        [0.4517],
        [0.4004],
        [0.5963],
        [0.3079],
        [0.5956],
        [0.3876],
        [0.2327],
        [0.7919],
        [0.2722],
        [0.3064],
        [0.9779],
        [0.8358],
        [0.1851],
        [0.2869],
        [0.3128],
        [0.4301],
        [0.4740],
        [0.6689],
        [0.7588]], device='cuda:0', grad_fn=<UnbindBackward0>)

label: tensor([[0.],
        [0.],
        [0.],
        [0.],
        [0.],
        [0.],
        [0.],
        [0.],
        [0.],
        [1.],
        [1.],
        [0.],
        [0.],
        [0.],
        [0.],
        [0.],
        [0.],
        [0.],
        [0.],
        [0.],
        [0.],
        [0.],
        [0.],
        [0.],
        [0.],
        [0.],
        [0.],
        [0.],
        [0.],
        [0.],
        [1.],
        [0.]], device='cuda:0')

我的模型：

class vit_large_patch16_224_multiTaskNet(nn.Module):
    def __init__(self, output_classes, frozen_feature_layers=False):
        super().__init__()
        
        vit_base_patch16_224 = timm.create_model('vit_large_patch16_224',pretrained=True)
        self.is_frozen = frozen_feature_layers
        # here we get all the modules(layers) before the fc layer at the end
        self.features = nn.ModuleList(vit_base_patch16_224.children())[:-1]
        self.features = nn.Sequential(*self.features)
        if frozen_feature_layers:
            self.freeze_feature_layers()

        # now lets add our new layers 
        in_features = vit_base_patch16_224.head.in_features
        # it helps with performance. you can play with it
        # create more layers, play/experiment with them. 
        self.fc0 = nn.Linear(in_features, 512)
        self.bn_pu = nn.BatchNorm1d(512, eps = 1e-5)
        self.output_modules = nn.ModuleList()
        for i in range(output_classes):
            self.output_modules.append(nn.Linear(512, 1))
        # initialize all fc layers to xavier
        for m in self.modules():
            if isinstance(m, nn.Linear):
                torch.nn.init.xavier_normal_(m.weight, gain = 1)


    def forward(self, input_imgs):
        output = self.features(input_imgs)
        final_cs_token = output[:, 0]
        output = self.bn_pu(F.relu(self.fc0(final_cs_token)))
        output_list= list()       
        for output_modul in self.output_modules:
          output_list.append(torch.sigmoid(output_modul(output)))
        # Convert List to Tensor
        output_tensor = torch.stack(output_list)
        # 
        output_tensor = torch.swapaxes(output_tensor, 0 , 1)
        return output_tensor
    
    def _set_freeze_(self, status):
        for n,p in self.features.named_parameters():
            p.requires_grad = status
        # for m in self.features.children():
        #     for p in m.parameters():
        #         p.requires_grad=status    


    def freeze_feature_layers(self):
        self._set_freeze_(False)

    def unfreeze_feature_layers(self):
        self._set_freeze_(True)

Answer 1

您处于 multi-class classification 场景中，这意味着您可以将您的问题视为 c-binary class classification done in并行（其中 c 是 class 的总数）。 output_t logit 张量包含模型最后一个线性层输出的值，target ground-truth 张量包含批次中每个实例的真实 classes 状态。您可以应用 nn.BCEWithLogitsLoss，因为它可以直接使用 multi-dimensional 张量：

使用虚拟输入：

>>> output_t = torch.rand(47, 32, 1)
>>> target = torch.randint(0, 2, (47, 32, 1)).float()

然后初始化并调用损失函数：

>>> loss = nn.BCEWithLogitsLoss()
>>> loss(output_t, target)
tensor(0.7246)

火炬 nn.CrossEntropyLoss() 总是 returns 0

Pytorch nn.CrossEntropyLoss() always returns 0

model

deep-learning

pytorch

loss-function