Stack 期望每个张量大小相等
Stack expects each tensor to be equal size
我正在关注 PyTorch tutorial 的语音命令识别,并尝试实现我自己对 22 个德语句子的识别。在本教程中,他们对音频张量使用填充,但对于标签,他们仅使用 torch.stack
。因此,当我开始训练网络时出现错误:
RuntimeError: stack expects each tensor to be equal size, but got [456] at entry 0 and [470] at entry 1
.
我明白这句话的意思,但不幸的是,由于我是 PyTorch 的新手,无法从头开始为句子实现填充功能。因此,如果你能给我一些提示和技巧,我会很高兴。
这是 collate_fn
和 pad_sequence
函数的代码:
def pad_sequence(batch):
# Make all tensor in a batch the same length by padding with zeros
batch = [item.t() for item in batch]
batch = torch.nn.utils.rnn.pad_sequence(batch, batch_first=True, padding_value=0.)
return batch.permute(0, 2, 1)
def collate_fn(batch):
# A data tuple has the form:
# waveform, label
tensors, targets = [], []
# Gather in lists, and encode labels as indices
for waveform, label in batch:
tensors += [waveform]
targets += [label]
# Group the list of tensors into a batched tensor
tensors = pad_sequence(tensors)
targets = torch.stack(targets)
return tensors, targets
当我开始直接使用 pad_sequence
工作时,我明白了它的工作原理是多么简单。因此,在我的例子中,我只需要一堆字符串 (batch
),它们由 PyTorch 自动比较并扩展到批处理中几个字符串之一的最大长度。
我的代码现在看起来像这样:
def pad_AudioSequence(batch):
# Make all tensor in a batch the same length by padding with zeros
batch = [item.t() for item in batch]
batch = torch.nn.utils.rnn.pad_sequence(batch, batch_first=True, padding_value=0.)
return batch.permute(0, 2, 1)
def pad_TextSequence(batch):
return torch.nn.utils.rnn.pad_sequence(batch,batch_first=True, padding_value=0)
def collate_fn(batch):
# A data tuple has the form:
# waveform, label
tensors, targets = [], []
# Gather in lists, and encode labels as indices
for waveform, label in batch:
tensors += [waveform]
targets += [label]
# Group the list of tensors into a batched tensor
tensors = pad_AudioSequence(tensors)
targets = pad_TextSequence(targets)
return tensors, targets
对于那些仍然不明白它是如何工作的人,这里有一个小例子:
encDecClass2 = dummyEncoderDecoder()
sent1 = audioWorkerClass.sentences[4] # wie viel Prozent hat der Akku noch?
sent2 = audioWorkerClass.sentences[5] # Wie spät ist es?
sent3 = audioWorkerClass.sentences[6] # Mach einen Timer für 5 Sekunden.
# encode sentences into tensor of numbers, representing words, using my own enc-dec class
sent1 = encDecClass2.encode(sent1) # tensor([11, 94, 21, 94, 22, 94, 23, 94, 24, 94, 25, 94, 26, 94, 15, 94])
sent2 = encDecClass2.encode(sent2) # tensor([27, 94, 28, 94, 12, 94, 29, 94, 15, 94])
sent3 = encDecClass2.encode(sent3) # tensor([30, 94, 31, 94, 32, 94, 33, 94, 34, 94, 35, 94, 19, 94])
print(sent1.shape) # torch.Size([16])
print(sent2.shape) # torch.Size([10])
print(sent3.shape) # torch.Size([14])
batch = []
# add sentences to the batch as separate arrays
batch +=[sent1]
batch +=[sent2]
batch +=[sent3]
output = pad_sequence(batch,batch_first=True, padding_value=0)
print(f"{output}\n{output.shape}")
#############################################################################
# output:
# tensor([[11, 94, 21, 94, 22, 94, 23, 94, 24, 94, 25, 94, 26, 94, 15, 94],
# [27, 94, 28, 94, 12, 94, 29, 94, 15, 94, 0, 0, 0, 0, 0, 0],
# [30, 94, 31, 94, 32, 94, 33, 94, 34, 94, 35, 94, 19, 94, 0, 0]])
# torch.Size([3, 16])
#############################################################################
如您所见,所有数组均等化为这三个数组的最大长度并用零填充。输出的形状是 3x16,因为我们有三个句子,最长的数组在批处理中有 16 个序列。
我正在关注 PyTorch tutorial 的语音命令识别,并尝试实现我自己对 22 个德语句子的识别。在本教程中,他们对音频张量使用填充,但对于标签,他们仅使用 torch.stack
。因此,当我开始训练网络时出现错误:
RuntimeError: stack expects each tensor to be equal size, but got [456] at entry 0 and [470] at entry 1
.
我明白这句话的意思,但不幸的是,由于我是 PyTorch 的新手,无法从头开始为句子实现填充功能。因此,如果你能给我一些提示和技巧,我会很高兴。
这是 collate_fn
和 pad_sequence
函数的代码:
def pad_sequence(batch):
# Make all tensor in a batch the same length by padding with zeros
batch = [item.t() for item in batch]
batch = torch.nn.utils.rnn.pad_sequence(batch, batch_first=True, padding_value=0.)
return batch.permute(0, 2, 1)
def collate_fn(batch):
# A data tuple has the form:
# waveform, label
tensors, targets = [], []
# Gather in lists, and encode labels as indices
for waveform, label in batch:
tensors += [waveform]
targets += [label]
# Group the list of tensors into a batched tensor
tensors = pad_sequence(tensors)
targets = torch.stack(targets)
return tensors, targets
当我开始直接使用 pad_sequence
工作时,我明白了它的工作原理是多么简单。因此,在我的例子中,我只需要一堆字符串 (batch
),它们由 PyTorch 自动比较并扩展到批处理中几个字符串之一的最大长度。
我的代码现在看起来像这样:
def pad_AudioSequence(batch):
# Make all tensor in a batch the same length by padding with zeros
batch = [item.t() for item in batch]
batch = torch.nn.utils.rnn.pad_sequence(batch, batch_first=True, padding_value=0.)
return batch.permute(0, 2, 1)
def pad_TextSequence(batch):
return torch.nn.utils.rnn.pad_sequence(batch,batch_first=True, padding_value=0)
def collate_fn(batch):
# A data tuple has the form:
# waveform, label
tensors, targets = [], []
# Gather in lists, and encode labels as indices
for waveform, label in batch:
tensors += [waveform]
targets += [label]
# Group the list of tensors into a batched tensor
tensors = pad_AudioSequence(tensors)
targets = pad_TextSequence(targets)
return tensors, targets
对于那些仍然不明白它是如何工作的人,这里有一个小例子:
encDecClass2 = dummyEncoderDecoder()
sent1 = audioWorkerClass.sentences[4] # wie viel Prozent hat der Akku noch?
sent2 = audioWorkerClass.sentences[5] # Wie spät ist es?
sent3 = audioWorkerClass.sentences[6] # Mach einen Timer für 5 Sekunden.
# encode sentences into tensor of numbers, representing words, using my own enc-dec class
sent1 = encDecClass2.encode(sent1) # tensor([11, 94, 21, 94, 22, 94, 23, 94, 24, 94, 25, 94, 26, 94, 15, 94])
sent2 = encDecClass2.encode(sent2) # tensor([27, 94, 28, 94, 12, 94, 29, 94, 15, 94])
sent3 = encDecClass2.encode(sent3) # tensor([30, 94, 31, 94, 32, 94, 33, 94, 34, 94, 35, 94, 19, 94])
print(sent1.shape) # torch.Size([16])
print(sent2.shape) # torch.Size([10])
print(sent3.shape) # torch.Size([14])
batch = []
# add sentences to the batch as separate arrays
batch +=[sent1]
batch +=[sent2]
batch +=[sent3]
output = pad_sequence(batch,batch_first=True, padding_value=0)
print(f"{output}\n{output.shape}")
#############################################################################
# output:
# tensor([[11, 94, 21, 94, 22, 94, 23, 94, 24, 94, 25, 94, 26, 94, 15, 94],
# [27, 94, 28, 94, 12, 94, 29, 94, 15, 94, 0, 0, 0, 0, 0, 0],
# [30, 94, 31, 94, 32, 94, 33, 94, 34, 94, 35, 94, 19, 94, 0, 0]])
# torch.Size([3, 16])
#############################################################################
如您所见,所有数组均等化为这三个数组的最大长度并用零填充。输出的形状是 3x16,因为我们有三个句子,最长的数组在批处理中有 16 个序列。