如何在 pytorch 中正确使用 CTC Loss with GRU?
How to correctly use CTC Loss with GRU in pytorch?
我正在尝试创建 ASR,我仍在学习,我只是在尝试使用一个简单的 GRU:
MySpeechRecognition(
(gru): GRU(128, 128, num_layers=5, batch_first=True, dropout=0.5)
(dropout): Dropout(p=0.3, inplace=False)
(fc1): Linear(in_features=128, out_features=512, bias=True)
(fc2): Linear(in_features=512, out_features=28, bias=True)
)
将每个输出分类为可能的字母之一 + space + 空白。
然后我使用 CTC 损失函数和 Adam 优化器:
lr = 5e-4
criterion = nn.CTCLoss(blank=28, zero_infinity=False)
optimizer = torch.optim.Adam(net.parameters(), lr=lr)
在我的训练循环中(我只显示有问题的区域):
output, h = mynet(specs, h)
print(output.size())
output = F.log_softmax(output, dim=2)
output = output.transpose(0,1)
# calculate the loss and perform backprop
loss = criterion(output, labels, input_lengths, label_lengths)
loss.backward()
我收到这个错误:
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-133-5e47e7b03a46> in <module>
42 output = output.transpose(0,1)
43 # calculate the loss and perform backprop
---> 44 loss = criterion(output, labels, input_lengths, label_lengths)
45 loss.backward()
46 # `clip_grad_norm` helps prevent the exploding gradient problem in RNNs / LSTMs.
/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
548 result = self._slow_forward(*input, **kwargs)
549 else:
--> 550 result = self.forward(*input, **kwargs)
551 for hook in self._forward_hooks.values():
552 hook_result = hook(self, input, result)
/opt/conda/lib/python3.7/site-packages/torch/nn/modules/loss.py in forward(self, log_probs, targets, input_lengths, target_lengths)
1309 def forward(self, log_probs, targets, input_lengths, target_lengths):
1310 return F.ctc_loss(log_probs, targets, input_lengths, target_lengths, self.blank, self.reduction,
-> 1311 self.zero_infinity)
1312
1313 # TODO: L1HingeEmbeddingCriterion
/opt/conda/lib/python3.7/site-packages/torch/nn/functional.py in ctc_loss(log_probs, targets, input_lengths, target_lengths, blank, reduction, zero_infinity)
2050 """
2051 return torch.ctc_loss(log_probs, targets, input_lengths, target_lengths, blank, _Reduction.get_enum(reduction),
-> 2052 zero_infinity)
2053
2054
RuntimeError: blank must be in label range
我不确定为什么会收到此错误。我尝试更改为
labels.float()
谢谢。
您的模型预测了 28 个 class,因此模型的输出大小为 [batch_size、seq_len、28](或 [seq_len, batch_size, 28] 用于给定 CTC 损失的对数概率)。在 nn.CTCLoss
中,您设置了 blank=28
,这意味着空白标签是索引为 28 的 class。要获取空白标签的对数概率,您可以将其索引为 output[:, :, 28]
,但这不起作用,因为该索引超出范围,因为有效索引为 0 到 27。
输出中的最后一个 class 位于索引 27,因此它应该是 blank=27
:
criterion = nn.CTCLoss(blank=27, zero_infinity=False)
我正在尝试创建 ASR,我仍在学习,我只是在尝试使用一个简单的 GRU:
MySpeechRecognition(
(gru): GRU(128, 128, num_layers=5, batch_first=True, dropout=0.5)
(dropout): Dropout(p=0.3, inplace=False)
(fc1): Linear(in_features=128, out_features=512, bias=True)
(fc2): Linear(in_features=512, out_features=28, bias=True)
)
将每个输出分类为可能的字母之一 + space + 空白。
然后我使用 CTC 损失函数和 Adam 优化器:
lr = 5e-4
criterion = nn.CTCLoss(blank=28, zero_infinity=False)
optimizer = torch.optim.Adam(net.parameters(), lr=lr)
在我的训练循环中(我只显示有问题的区域):
output, h = mynet(specs, h)
print(output.size())
output = F.log_softmax(output, dim=2)
output = output.transpose(0,1)
# calculate the loss and perform backprop
loss = criterion(output, labels, input_lengths, label_lengths)
loss.backward()
我收到这个错误:
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-133-5e47e7b03a46> in <module>
42 output = output.transpose(0,1)
43 # calculate the loss and perform backprop
---> 44 loss = criterion(output, labels, input_lengths, label_lengths)
45 loss.backward()
46 # `clip_grad_norm` helps prevent the exploding gradient problem in RNNs / LSTMs.
/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
548 result = self._slow_forward(*input, **kwargs)
549 else:
--> 550 result = self.forward(*input, **kwargs)
551 for hook in self._forward_hooks.values():
552 hook_result = hook(self, input, result)
/opt/conda/lib/python3.7/site-packages/torch/nn/modules/loss.py in forward(self, log_probs, targets, input_lengths, target_lengths)
1309 def forward(self, log_probs, targets, input_lengths, target_lengths):
1310 return F.ctc_loss(log_probs, targets, input_lengths, target_lengths, self.blank, self.reduction,
-> 1311 self.zero_infinity)
1312
1313 # TODO: L1HingeEmbeddingCriterion
/opt/conda/lib/python3.7/site-packages/torch/nn/functional.py in ctc_loss(log_probs, targets, input_lengths, target_lengths, blank, reduction, zero_infinity)
2050 """
2051 return torch.ctc_loss(log_probs, targets, input_lengths, target_lengths, blank, _Reduction.get_enum(reduction),
-> 2052 zero_infinity)
2053
2054
RuntimeError: blank must be in label range
我不确定为什么会收到此错误。我尝试更改为
labels.float()
谢谢。
您的模型预测了 28 个 class,因此模型的输出大小为 [batch_size、seq_len、28](或 [seq_len, batch_size, 28] 用于给定 CTC 损失的对数概率)。在 nn.CTCLoss
中,您设置了 blank=28
,这意味着空白标签是索引为 28 的 class。要获取空白标签的对数概率,您可以将其索引为 output[:, :, 28]
,但这不起作用,因为该索引超出范围,因为有效索引为 0 到 27。
输出中的最后一个 class 位于索引 27,因此它应该是 blank=27
:
criterion = nn.CTCLoss(blank=27, zero_infinity=False)