Pytorch LSTM:计算交叉熵损失的目标维度
Pytorch LSTM: Target Dimension in Calculating Cross Entropy Loss
我一直在尝试获取 LSTM(自定义模型中的 LSTM 后跟线性层),在 Pytorch 中工作,但在计算损失时出现以下错误:
Assertion cur_target >= 0 && cur_target < n_classes' failed.
我定义了损失函数:
criterion = nn.CrossEntropyLoss()
然后用
调用
loss += criterion(output, target)
我给目标的维度是 [sequence_length, number_of_classes],而输出的维度是 [sequence_length, 1, number_of_classes].
我遵循的示例似乎在做同样的事情,但在 Pytorch docs on cross entropy loss.
上却有所不同
文档说target应该是维度(N),其中每个值是0≤targets[i]≤C−1,C是类的个数。我将目标更改为那种形式,但现在我收到一条错误消息(序列长度为 75,并且有 55 类):
Expected target size (75, 55), got torch.Size([75])
我已经尝试寻找这两个错误的解决方案,但仍然无法正常工作。我对目标的正确尺寸以及第一个错误背后的实际含义感到困惑(不同的搜索对错误给出了非常不同的含义,none 的修复有效)。
谢谢
您可以在您的 output
张量上使用 squeeze()
,这个 returns 一个删除了大小为 1 的所有维度的张量。
这段简短的代码使用了您在问题中提到的形状:
sequence_length = 75
number_of_classes = 55
# creates random tensor of your output shape
output = torch.rand(sequence_length, 1, number_of_classes)
# creates tensor with random targets
target = torch.randint(55, (75,)).long()
# define loss function and calculate loss
criterion = nn.CrossEntropyLoss()
loss = criterion(output, target)
print(loss)
导致您描述的错误:
ValueError: Expected target size (75, 55), got torch.Size([75])
因此,在您的 output
张量上使用 squeeze()
可以通过使其形状正确来解决您的问题。
形状更正的示例:
sequence_length = 75
number_of_classes = 55
# creates random tensor of your output shape
output = torch.rand(sequence_length, 1, number_of_classes)
# creates tensor with random targets
target = torch.randint(55, (75,)).long()
# define loss function and calculate loss
criterion = nn.CrossEntropyLoss()
# apply squeeze() on output tensor to change shape form [75, 1, 55] to [75, 55]
loss = criterion(output.squeeze(), target)
print(loss)
输出:
tensor(4.0442)
使用 squeeze()
将张量形状从 [75, 1, 55]
更改为 [75, 55]
,以便输出和目标形状匹配!
您还可以使用其他方法来重塑张量,重要的是您的形状是 [sequence_length, number_of_classes]
而不是 [sequence_length, 1, number_of_classes]
.
你的目标应该是 LongTensor
resp。 torch.long
类型的张量,包含 类。这里的形状是 [sequence_length]
.
编辑:
传递给交叉熵函数时来自上例的形状:
输出:torch.Size([75, 55])
目标:torch.Size([75])
这是一个更一般的示例,CE 的输出和目标应该是什么样子。在这种情况下,我们假设我们有 5 个不同的目标 类,长度为 1、2 和 3 的序列有三个示例:
# init CE Loss function
criterion = nn.CrossEntropyLoss()
# sequence of length 1
output = torch.rand(1, 5)
# in this case the 1th class is our target, index of 1th class is 0
target = torch.LongTensor([0])
loss = criterion(output, target)
print('Sequence of length 1:')
print('Output:', output, 'shape:', output.shape)
print('Target:', target, 'shape:', target.shape)
print('Loss:', loss)
# sequence of length 2
output = torch.rand(2, 5)
# targets are here 1th class for the first element and 2th class for the second element
target = torch.LongTensor([0, 1])
loss = criterion(output, target)
print('\nSequence of length 2:')
print('Output:', output, 'shape:', output.shape)
print('Target:', target, 'shape:', target.shape)
print('Loss:', loss)
# sequence of length 3
output = torch.rand(3, 5)
# targets here 1th class, 2th class and 2th class again for the last element of the sequence
target = torch.LongTensor([0, 1, 1])
loss = criterion(output, target)
print('\nSequence of length 3:')
print('Output:', output, 'shape:', output.shape)
print('Target:', target, 'shape:', target.shape)
print('Loss:', loss)
输出:
Sequence of length 1:
Output: tensor([[ 0.1956, 0.0395, 0.6564, 0.4000, 0.2875]]) shape: torch.Size([1, 5])
Target: tensor([ 0]) shape: torch.Size([1])
Loss: tensor(1.7516)
Sequence of length 2:
Output: tensor([[ 0.9905, 0.2267, 0.7583, 0.4865, 0.3220],
[ 0.8073, 0.1803, 0.5290, 0.3179, 0.2746]]) shape: torch.Size([2, 5])
Target: tensor([ 0, 1]) shape: torch.Size([2])
Loss: tensor(1.5469)
Sequence of length 3:
Output: tensor([[ 0.8497, 0.2728, 0.3329, 0.2278, 0.1459],
[ 0.4899, 0.2487, 0.4730, 0.9970, 0.1350],
[ 0.0869, 0.9306, 0.1526, 0.2206, 0.6328]]) shape: torch.Size([3, 5])
Target: tensor([ 0, 1, 1]) shape: torch.Size([3])
Loss: tensor(1.3918)
希望对您有所帮助!
我一直在尝试获取 LSTM(自定义模型中的 LSTM 后跟线性层),在 Pytorch 中工作,但在计算损失时出现以下错误:
Assertion cur_target >= 0 && cur_target < n_classes' failed.
我定义了损失函数:
criterion = nn.CrossEntropyLoss()
然后用
调用loss += criterion(output, target)
我给目标的维度是 [sequence_length, number_of_classes],而输出的维度是 [sequence_length, 1, number_of_classes].
我遵循的示例似乎在做同样的事情,但在 Pytorch docs on cross entropy loss.
上却有所不同文档说target应该是维度(N),其中每个值是0≤targets[i]≤C−1,C是类的个数。我将目标更改为那种形式,但现在我收到一条错误消息(序列长度为 75,并且有 55 类):
Expected target size (75, 55), got torch.Size([75])
我已经尝试寻找这两个错误的解决方案,但仍然无法正常工作。我对目标的正确尺寸以及第一个错误背后的实际含义感到困惑(不同的搜索对错误给出了非常不同的含义,none 的修复有效)。
谢谢
您可以在您的 output
张量上使用 squeeze()
,这个 returns 一个删除了大小为 1 的所有维度的张量。
这段简短的代码使用了您在问题中提到的形状:
sequence_length = 75
number_of_classes = 55
# creates random tensor of your output shape
output = torch.rand(sequence_length, 1, number_of_classes)
# creates tensor with random targets
target = torch.randint(55, (75,)).long()
# define loss function and calculate loss
criterion = nn.CrossEntropyLoss()
loss = criterion(output, target)
print(loss)
导致您描述的错误:
ValueError: Expected target size (75, 55), got torch.Size([75])
因此,在您的 output
张量上使用 squeeze()
可以通过使其形状正确来解决您的问题。
形状更正的示例:
sequence_length = 75
number_of_classes = 55
# creates random tensor of your output shape
output = torch.rand(sequence_length, 1, number_of_classes)
# creates tensor with random targets
target = torch.randint(55, (75,)).long()
# define loss function and calculate loss
criterion = nn.CrossEntropyLoss()
# apply squeeze() on output tensor to change shape form [75, 1, 55] to [75, 55]
loss = criterion(output.squeeze(), target)
print(loss)
输出:
tensor(4.0442)
使用 squeeze()
将张量形状从 [75, 1, 55]
更改为 [75, 55]
,以便输出和目标形状匹配!
您还可以使用其他方法来重塑张量,重要的是您的形状是 [sequence_length, number_of_classes]
而不是 [sequence_length, 1, number_of_classes]
.
你的目标应该是 LongTensor
resp。 torch.long
类型的张量,包含 类。这里的形状是 [sequence_length]
.
编辑:
传递给交叉熵函数时来自上例的形状:
输出:torch.Size([75, 55])
目标:torch.Size([75])
这是一个更一般的示例,CE 的输出和目标应该是什么样子。在这种情况下,我们假设我们有 5 个不同的目标 类,长度为 1、2 和 3 的序列有三个示例:
# init CE Loss function
criterion = nn.CrossEntropyLoss()
# sequence of length 1
output = torch.rand(1, 5)
# in this case the 1th class is our target, index of 1th class is 0
target = torch.LongTensor([0])
loss = criterion(output, target)
print('Sequence of length 1:')
print('Output:', output, 'shape:', output.shape)
print('Target:', target, 'shape:', target.shape)
print('Loss:', loss)
# sequence of length 2
output = torch.rand(2, 5)
# targets are here 1th class for the first element and 2th class for the second element
target = torch.LongTensor([0, 1])
loss = criterion(output, target)
print('\nSequence of length 2:')
print('Output:', output, 'shape:', output.shape)
print('Target:', target, 'shape:', target.shape)
print('Loss:', loss)
# sequence of length 3
output = torch.rand(3, 5)
# targets here 1th class, 2th class and 2th class again for the last element of the sequence
target = torch.LongTensor([0, 1, 1])
loss = criterion(output, target)
print('\nSequence of length 3:')
print('Output:', output, 'shape:', output.shape)
print('Target:', target, 'shape:', target.shape)
print('Loss:', loss)
输出:
Sequence of length 1:
Output: tensor([[ 0.1956, 0.0395, 0.6564, 0.4000, 0.2875]]) shape: torch.Size([1, 5])
Target: tensor([ 0]) shape: torch.Size([1])
Loss: tensor(1.7516)
Sequence of length 2:
Output: tensor([[ 0.9905, 0.2267, 0.7583, 0.4865, 0.3220],
[ 0.8073, 0.1803, 0.5290, 0.3179, 0.2746]]) shape: torch.Size([2, 5])
Target: tensor([ 0, 1]) shape: torch.Size([2])
Loss: tensor(1.5469)
Sequence of length 3:
Output: tensor([[ 0.8497, 0.2728, 0.3329, 0.2278, 0.1459],
[ 0.4899, 0.2487, 0.4730, 0.9970, 0.1350],
[ 0.0869, 0.9306, 0.1526, 0.2206, 0.6328]]) shape: torch.Size([3, 5])
Target: tensor([ 0, 1, 1]) shape: torch.Size([3])
Loss: tensor(1.3918)
希望对您有所帮助!