每个 epoch 之后的验证准确率和损失是相同的

Validation accuracy and loss is the same after each epoch

我的验证准确率在每个纪元后都是一样的。不确定我在这里做错了什么?我在下面添加了我的 CNN 网络和我的训练功能。我初始化了 CNN 一次。然而,训练函数工作得很好,损失下降了,每个时期的准确性都提高了。我制作了一个测试函数,其结构与我的验证函数相同,结果发生了同样的事情。我的 train/val 分配是 40000/10000。我正在使用 cifar 10.

下面是我的代码:


#Make train function (simple at first)
def train_network(model, optimizer, train_loader, num_epochs=10):

  total_epochs = notebook.tqdm(range(num_epochs))
  model.train()

  for epoch in total_epochs:
    train_acc = 0.0
    running_loss = 0.0

    for i, (x_train, y_train) in enumerate(train_loader):
      x_train, y_train = x_train.to(device), y_train.to(device)

      y_pred = model(x_train)
      loss = criterion(y_pred, y_train)
    
      loss.backward()
      optimizer.step()
      optimizer.zero_grad()

      running_loss += loss.item()
      train_acc += accuracy(y_pred, y_train)

    running_loss /= len(train_loader)
    train_acc /= len(train_loader)

    print('Evaluation Loss: %.3f | Evaluation Accuracy: %.3f'%(running_loss, train_acc))


@torch.no_grad()
def validate_network(model, optimizer, val_loader, num_epochs=10):
  model.eval()
  total_epochs = notebook.tqdm(range(num_epochs))


  for epoch in total_epochs:  
    accu = 0.0
    running_loss = 0.0

    for i, (x_val, y_val) in enumerate(val_loader):
      x_val, y_val = x_val.to(device), y_val.to(device)

      val_pred = model(x_val)
      loss = criterion(val_pred, y_val)

      running_loss += loss.item()
      accu += accuracy(val_pred, y_val)

    running_loss /= len(val_loader)
    accu /= len(val_loader)

    
    print('Val Loss: %.3f | Val Accuracy: %.3f'%(running_loss,accu))

输出:

Val Loss: 0.623 | Val Accuracy: 0.786
Val Loss: 0.623 | Val Accuracy: 0.786
Val Loss: 0.623 | Val Accuracy: 0.786
Val Loss: 0.623 | Val Accuracy: 0.786
Val Loss: 0.623 | Val Accuracy: 0.786
Val Loss: 0.623 | Val Accuracy: 0.786
Val Loss: 0.623 | Val Accuracy: 0.786
Val Loss: 0.623 | Val Accuracy: 0.786
Val Loss: 0.623 | Val Accuracy: 0.786
Val Loss: 0.623 | Val Accuracy: 0.786

所以我想我的问题是,如何在验证时获得每个时期的准确性和损失的代表性输出。

这里发生的是你 运行 一个 number_of_epochs 的循环,你只是多次访问同一个网络。我建议您在每个 epoch 结束时在训练期间调用验证函数,以测试 epoch 对模型性能的改进。这意味着训练函数应该类似于:

def train_network(model, optimizer, train_loader, val_loader, num_epochs=10):

  total_epochs = notebook.tqdm(range(num_epochs))
  model.train()

  for epoch in total_epochs:
    train_acc = 0.0
    running_loss = 0.0

    for i, (x_train, y_train) in enumerate(train_loader):
      x_train, y_train = x_train.to(device), y_train.to(device)

      y_pred = model(x_train)
      loss = criterion(y_pred, y_train)
    
      loss.backward()
      optimizer.step()
      optimizer.zero_grad()

      running_loss += loss.item()
      train_acc += accuracy(y_pred, y_train)

    running_loss /= len(train_loader)
    train_acc /= len(train_loader)

    print('Evaluation Loss: %.3f | Evaluation Accuracy: %.3f'%(running_loss, train_acc))
    validate_network(model, optimizer, val_loader, num_epochs=1)

请注意,我添加了验证加载器作为输入,并在每个纪元结束时调用了验证函数,将纪元的验证数设置为 1。一个小的额外更改是从验证函数中删除纪元循环.