PyTorch multi-class: ValueError: Expected input batch_size (416) to match target batch_size (32)

PyTorch multi-class: ValueError: Expected input batch_size (416) to match target batch_size (32)

我创建了一个多class class化神经网络。使用带有字段 {'text_normalized_tweet':TEXT, 'label': LABEL}

的 BigBucketIterator 方法创建的训练和验证迭代器

TEXT = 推文 LABEL = 一个浮点数(有 3 个值:0,1,2)

下面我执行我的神经网络的虚拟示例:

import torch.nn as nn

class MultiClassClassifer(nn.Module):
  #define all the layers used in model
  def __init__(self, vocab_size, embedding_dim, hidden_dim, output_dim):
    
    #Constructor
    super(MultiClassClassifer, self).__init__()

    #embedding layer
    self.embedding = nn.Embedding(vocab_size, embedding_dim)

    #dense layer
    self.hiddenLayer = nn.Linear(embedding_dim, hidden_dim)

    #Batch normalization layer
    self.batchnorm = nn.BatchNorm1d(hidden_dim)

    #output layer
    self.output = nn.Linear(hidden_dim, output_dim)

    #activation layer
    self.act = nn.Softmax(dim=1) #2d-tensor

    #initialize weights of embedding layer
    self.init_weights()

  def init_weights(self):

    initrange = 1.0
    
    self.embedding.weight.data.uniform_(-initrange, initrange)
  
  def forward(self, text, text_lengths):

    embedded = self.embedding(text)

    #packed sequence
    packed_embedded = nn.utils.rnn.pack_padded_sequence(embedded, text_lengths, batch_first=True)

    tensor, batch_size = packed_embedded[0], packed_embedded[1]

    hidden_1 = self.batchnorm(self.hiddenLayer(tensor))

    return self.act(self.output(hidden_1))

实例化模型

INPUT_DIM = len(TEXT.vocab)
EMBEDDING_DIM = 100
HIDDEN_DIM = 64
OUTPUT_DIM = 3

model = MultiClassClassifer(INPUT_DIM, EMBEDDING_DIM, HIDDEN_DIM, OUTPUT_DIM)

当我打电话时

text, text_lengths = batch.text_normalized_tweet
                
predictions = model(text, text_lengths).squeeze()

loss = criterion(predictions, batch.label)

它returns,

ValueError: Expected input batch_size (416) to match target batch_size (32).

model(text, text_lengths).squeeze() = torch.Size([416, 3])
batch.label = torch.Size([32])

我可以看到这两个对象有不同的大小,但我不知道如何解决这个问题?

您可能会找到 Google Colab 笔记本 here

我的 forward() 方法的每个输入、输出张量的形状:

torch.Size([32, 10, 100]) #self.embedding(text)
torch.Size([320, 100]) #nn.utils.rnn.pack_padded_sequence(embedded, text_lengths, batch_first=True)
torch.Size([320, 64]) #self.batchnorm(self.hiddenLayer(tensor))
torch.Size([320, 3]) #self.act(self.output(hidden_1))

你不应该在前向传递之后使用 squeeze 函数,那没有意义。

删除 squeeze 函数后,如您所见,最终输出的形状为 [320,3],而预期为 [32,3]。解决此问题的一种方法是在 self.Embedding 函数之后对每个单词获得的嵌入进行平均,如下所示:

def forward(self, text, text_lengths):

    embedded = self.embedding(text)
    embedded = torch.mean(embedded, dim=1, keepdim=True)

    packed_embedded = nn.utils.rnn.pack_padded_sequence(embedded, text_lengths, batch_first=True)
    tensor, batch_size = packed_embedded[0], packed_embedded[1]

    hidden_1 = self.batchnorm(self.hiddenLayer(tensor))
    return self.act(self.output(hidden_1))