使用CrossEntropyLoss函数时,目标batch size不匹配的错误如何解决?

How to fix the error where the target batch size does not match when I use CrossEntropyLoss function?

我正在使用 CNN 进行训练任务。当我使用 CrossEntropyLoss 创建损失函数并训练数据集时,错误提醒我批大小不匹配。 这是训练的主要代码:

net = SimpleConvolutionalNetwork()

train_history, val_history = train(net, batch_size=32, n_epochs=10, learning_rate=0.001)

plot_losses(train_history, val_history)

这是神经元网络代码:

class SimpleConvolutionalNetwork(nn.Module):

  # Q: why the scope of input not changed after relu??
  
  def __init__(self) -> None:
      super(SimpleConvolutionalNetwork, self).__init__()

      # define convolutional filting layer(3 grids) and output size(18 channels)
      self.conv1 = nn.Conv2d(3, 18, kernel_size=3, stride=1, padding=1)

      # define pooling layer with max-pooling function
      self.pool = nn.MaxPool2d(kernel_size=2, stride=2, padding=0)

      # define FCL and output layer by Linear function
      self.fc1 = nn.Linear(18*16*16, 64)
      self.fc2 = nn.Linear(64, 10)

  # Q: where the pooling layer??

  def forward(self, x):
    # input shape: 3(grids) * 32 * 32(32*32 is the scope of each grid)
    # filted by conv1 defined in the construction function
    # then relu the filted x
    x = F.relu(self.conv1(x))

    # now let 18*32*32 -> 18*16*16
    x = x.view(-1, 18*16*16)

    # two step for 18*16*16(totally 4608) -> 64
    # output by FC firstly, then relu again the output
    x = F.relu(self.fc1(x))

    # 64 -> 10 finally
    x = self.fc2(x)
    return x

train函数中,错误的地方在损失函数的构造上。由于上下文很长,主要部分如下所示:

def train(net, batch_size, n_epochs, learning_rate):
...
    # load the training dataset
  train_loader = get_train_loader(batch_size)

  # get validation dataset
  val_loader = get_val_loader(batch_size)

  # set batch size
  n_minibatches = len(train_loader)

  # set loss function and validation test checking
  criterion, optimizer = createLossAndOptimizer(net, learning_rate)

  train_history = []
  val_history = []

  training_start_time = time.time()
  best_error = np.inf
  best_model_path = "best_model_path"

  # GPU if possible
  net = net.to(device)

  for epoch in range(n_epochs):

    running_loss = 0.0
    print_every = n_minibatches
    start_time = time.time()
    total_train_loss = 0.0

    # step1: training the datasets
    for i, (inputs, labels) in enumerate(train_loader):
      inputs, labels = inputs.to(device), labels.to(device)

      optimizer.zero_grad()

      # forward + backward + optimize
      outputs = net(inputs)
      loss = criterion(outputs, labels)
      loss.backward()
      optimizer.step()

      #print statistics
      running_loss += loss.item()
      total_train_loss += loss.item()

      # print every 10th of epoch
      if (i + 1) % (print_every + 1) == 0:    
        print("Epoch {}, {:d}% \t train_loss: {:.2f} took: {:.2f}s".format(
          epoch + 1, int(100 * (i + 1) / n_minibatches), running_loss / print_every,
          time.time() - start_time))
        running_loss = 0.0
        start_time = time.time()

    train_history.append(total_train_loss / len(train_loader))
...

损失构造函数和数据集加载是这样的:

def createLossAndOptimizer(net, learning_rate=0.001):

  # define a cross-entropy loss function:
  criterion = nn.CrossEntropyLoss()

  # optimizer include three parameters: net, learning rate, and 
  # momentum rate for validate the dataset from over-fitting(default
  # value is 0.9)

  optimizer = opt.Adam(net.parameters(), lr=learning_rate)
  return criterion, optimizer

def get_train_loader(batch_size):
  return th.utils.data.DataLoader(train_set,batch_size=batch_size,sampler=train_sampler, num_workers=num_workers)

def get_val_loader(batch_size):
  return th.utils.data.DataLoader(train_set,batch_size=batch_size,sampler=train_sampler, num_workers=num_workers)

但是报错提醒我输入的batch size大于target batch size:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-19-07b692e7a2bb> in <module>()
    173 net = SimpleConvolutionalNetwork()
    174 
--> 175 train_history, val_history = train(net, batch_size=32, n_epochs=10, learning_rate=0.001)
    176 
    177 plot_losses(train_history, val_history)

3 frames
/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py in cross_entropy(input, target, weight, size_average, ignore_index, reduce, reduction, label_smoothing)
   2844     if size_average is not None or reduce is not None:
   2845         reduction = _Reduction.legacy_get_string(size_average, reduce)
-> 2846     return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)
   2847 
   2848 

ValueError: Expected input batch_size (128) to match target batch_size (32).

我主要是因为 'labels' 是 4 号,所以我错误地设置了错误的参数。但我不知道如何解决它。感谢您的回答。

SimpleConvolutionalNetworkforward方法中应用conv1后,张量x的形状为(batch_size, 18, 32, 32)。因此,当 xx = x.view(-1, 18 * 16 * 16) 形状变为 (batch_size * 4, 18 * 16 * 16) 并且由于进一步应用 fully-connected 层不会更改此新批量大小,输出的形状为 (batch_size * 4, 10)。我的建议是在卷积后立即使用池化,例如:

 x = F.relu(self.conv1(x))  # after that x will have shape (batch_size, 18, 32, 32) 
 x = self.pool(x)           # after that x will have shape (batch_size, 18, 16, 16)

往前走将 return 张量 (batch_size, 10) 并且批量大小不匹配错误将不会发生。