训练循环中损失显着增加

loss increasing significantly in training loop

我的训练损失急剧增加。不知道这是为什么。我在想这可能与我计算损失的方式有关,但不确定。我认为这可能是因为我正在打印 运行 损失而不是每批损失。下面是我的训练循环:

def train_model(model, optimizer, train_loader,  num_epochs, criterion=criterion):
  
  total_epochs = notebook.tqdm(range(num_epochs))

  model.train()

  running_loss=0
  correct=0
  total=0

  for epoch in total_epochs:
    for i, (x_train, y_train) in enumerate(train_loader):

      x_train = x_train.to(device)
      y_train = y_train.to(device)
        
      y_pred = model(x_train)
      loss = criterion(y_pred, y_train)

      optimizer.zero_grad()
      loss.backward()
      optimizer.step()

      running_loss += loss.item()
      
      _, predicted = y_pred.max(1)
      train_loss=running_loss/len(train_loader)


      total += y_train.size(0)
      correct += predicted.eq(y_train).sum().item()
        
    train_loss=running_loss/len(train_loader)
    train_accu=100.*correct/total

    print('Train Loss: %.3f | Train Accuracy: %.3f'%(train_loss,train_accu))

但是当我调用 train_model():

train_md = train_model(cnn_net, optimizer, data_loaders['train'], 10)

它returns这个:

Train Loss: 1.472 | Train Accuracy: 47.949
Train Loss: 2.655 | Train Accuracy: 53.324
Train Loss: 3.732 | Train Accuracy: 56.521
Train Loss: 4.750 | Train Accuracy: 58.565
Train Loss: 5.728 | Train Accuracy: 60.130
Train Loss: 6.673 | Train Accuracy: 61.364
Train Loss: 7.590 | Train Accuracy: 62.335
Train Loss: 8.484 | Train Accuracy: 63.190
Train Loss: 9.365 | Train Accuracy: 63.934
Train Loss: 10.225 | Train Accuracy: 64.571

你一直把亏损累积到running_loss
这就是为什么它每个时代都在增加的原因!