火炬 nn.CrossEntropyLoss() 总是 returns 0
Pytorch nn.CrossEntropyLoss() always returns 0
我正在构建一个多class Vision Transformer 网络。通过我的损失函数传递我的值时,它总是 returns 零。我的输出层由 37 个密集层组成,每个层上都有一个 softmax 单元。 criterion 是用 nn.CrossEntropyLoss() 创建的。每次迭代的 criterion 输出都是 0.0。我正在使用 colab 笔记本。我打印了一次迭代的输出和标签:
for output, label in zip(iter(ouputs_t), iter(labels_t)):
loss += criterion(
output,
# reshape label from (Batch_Size) to (Batch_Size, 1)
torch.reshape(label, (label.shape[0] , 1 ))
)
output: tensor([[0.1534],
[0.5797],
[0.6554],
[0.4066],
[0.2683],
[0.1773],
[0.7410],
[0.5136],
[0.5695],
[0.3970],
[0.4317],
[0.7216],
[0.8336],
[0.4517],
[0.4004],
[0.5963],
[0.3079],
[0.5956],
[0.3876],
[0.2327],
[0.7919],
[0.2722],
[0.3064],
[0.9779],
[0.8358],
[0.1851],
[0.2869],
[0.3128],
[0.4301],
[0.4740],
[0.6689],
[0.7588]], device='cuda:0', grad_fn=<UnbindBackward0>)
label: tensor([[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[1.],
[1.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[1.],
[0.]], device='cuda:0')
我的模型:
class vit_large_patch16_224_multiTaskNet(nn.Module):
def __init__(self, output_classes, frozen_feature_layers=False):
super().__init__()
vit_base_patch16_224 = timm.create_model('vit_large_patch16_224',pretrained=True)
self.is_frozen = frozen_feature_layers
# here we get all the modules(layers) before the fc layer at the end
self.features = nn.ModuleList(vit_base_patch16_224.children())[:-1]
self.features = nn.Sequential(*self.features)
if frozen_feature_layers:
self.freeze_feature_layers()
# now lets add our new layers
in_features = vit_base_patch16_224.head.in_features
# it helps with performance. you can play with it
# create more layers, play/experiment with them.
self.fc0 = nn.Linear(in_features, 512)
self.bn_pu = nn.BatchNorm1d(512, eps = 1e-5)
self.output_modules = nn.ModuleList()
for i in range(output_classes):
self.output_modules.append(nn.Linear(512, 1))
# initialize all fc layers to xavier
for m in self.modules():
if isinstance(m, nn.Linear):
torch.nn.init.xavier_normal_(m.weight, gain = 1)
def forward(self, input_imgs):
output = self.features(input_imgs)
final_cs_token = output[:, 0]
output = self.bn_pu(F.relu(self.fc0(final_cs_token)))
output_list= list()
for output_modul in self.output_modules:
output_list.append(torch.sigmoid(output_modul(output)))
# Convert List to Tensor
output_tensor = torch.stack(output_list)
#
output_tensor = torch.swapaxes(output_tensor, 0 , 1)
return output_tensor
def _set_freeze_(self, status):
for n,p in self.features.named_parameters():
p.requires_grad = status
# for m in self.features.children():
# for p in m.parameters():
# p.requires_grad=status
def freeze_feature_layers(self):
self._set_freeze_(False)
def unfreeze_feature_layers(self):
self._set_freeze_(True)
您处于 multi-class classification 场景中,这意味着您可以将您的问题视为 c
-binary class classification done in并行(其中 c
是 class 的总数)。 output_t
logit 张量包含模型最后一个线性层输出的值,target
ground-truth 张量包含批次中每个实例的真实 classes 状态。您可以应用 nn.BCEWithLogitsLoss
,因为它可以直接使用 multi-dimensional 张量:
使用虚拟输入:
>>> output_t = torch.rand(47, 32, 1)
>>> target = torch.randint(0, 2, (47, 32, 1)).float()
然后初始化并调用损失函数:
>>> loss = nn.BCEWithLogitsLoss()
>>> loss(output_t, target)
tensor(0.7246)
我正在构建一个多class Vision Transformer 网络。通过我的损失函数传递我的值时,它总是 returns 零。我的输出层由 37 个密集层组成,每个层上都有一个 softmax 单元。 criterion 是用 nn.CrossEntropyLoss() 创建的。每次迭代的 criterion 输出都是 0.0。我正在使用 colab 笔记本。我打印了一次迭代的输出和标签:
for output, label in zip(iter(ouputs_t), iter(labels_t)):
loss += criterion(
output,
# reshape label from (Batch_Size) to (Batch_Size, 1)
torch.reshape(label, (label.shape[0] , 1 ))
)
output: tensor([[0.1534],
[0.5797],
[0.6554],
[0.4066],
[0.2683],
[0.1773],
[0.7410],
[0.5136],
[0.5695],
[0.3970],
[0.4317],
[0.7216],
[0.8336],
[0.4517],
[0.4004],
[0.5963],
[0.3079],
[0.5956],
[0.3876],
[0.2327],
[0.7919],
[0.2722],
[0.3064],
[0.9779],
[0.8358],
[0.1851],
[0.2869],
[0.3128],
[0.4301],
[0.4740],
[0.6689],
[0.7588]], device='cuda:0', grad_fn=<UnbindBackward0>)
label: tensor([[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[1.],
[1.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[1.],
[0.]], device='cuda:0')
我的模型:
class vit_large_patch16_224_multiTaskNet(nn.Module):
def __init__(self, output_classes, frozen_feature_layers=False):
super().__init__()
vit_base_patch16_224 = timm.create_model('vit_large_patch16_224',pretrained=True)
self.is_frozen = frozen_feature_layers
# here we get all the modules(layers) before the fc layer at the end
self.features = nn.ModuleList(vit_base_patch16_224.children())[:-1]
self.features = nn.Sequential(*self.features)
if frozen_feature_layers:
self.freeze_feature_layers()
# now lets add our new layers
in_features = vit_base_patch16_224.head.in_features
# it helps with performance. you can play with it
# create more layers, play/experiment with them.
self.fc0 = nn.Linear(in_features, 512)
self.bn_pu = nn.BatchNorm1d(512, eps = 1e-5)
self.output_modules = nn.ModuleList()
for i in range(output_classes):
self.output_modules.append(nn.Linear(512, 1))
# initialize all fc layers to xavier
for m in self.modules():
if isinstance(m, nn.Linear):
torch.nn.init.xavier_normal_(m.weight, gain = 1)
def forward(self, input_imgs):
output = self.features(input_imgs)
final_cs_token = output[:, 0]
output = self.bn_pu(F.relu(self.fc0(final_cs_token)))
output_list= list()
for output_modul in self.output_modules:
output_list.append(torch.sigmoid(output_modul(output)))
# Convert List to Tensor
output_tensor = torch.stack(output_list)
#
output_tensor = torch.swapaxes(output_tensor, 0 , 1)
return output_tensor
def _set_freeze_(self, status):
for n,p in self.features.named_parameters():
p.requires_grad = status
# for m in self.features.children():
# for p in m.parameters():
# p.requires_grad=status
def freeze_feature_layers(self):
self._set_freeze_(False)
def unfreeze_feature_layers(self):
self._set_freeze_(True)
您处于 multi-class classification 场景中,这意味着您可以将您的问题视为 c
-binary class classification done in并行(其中 c
是 class 的总数)。 output_t
logit 张量包含模型最后一个线性层输出的值,target
ground-truth 张量包含批次中每个实例的真实 classes 状态。您可以应用 nn.BCEWithLogitsLoss
,因为它可以直接使用 multi-dimensional 张量:
使用虚拟输入:
>>> output_t = torch.rand(47, 32, 1)
>>> target = torch.randint(0, 2, (47, 32, 1)).float()
然后初始化并调用损失函数:
>>> loss = nn.BCEWithLogitsLoss()
>>> loss(output_t, target)
tensor(0.7246)