来自一个热编码标签的 BERT 模型损失函数

BERT model loss function from one hot encoded labels

对于行:loss = model(b_input_ids, token_type_ids=None, attention_mask=b_input_mask, labels=b_labels ) 我对标签进行热编码,使其成为 32x17 的张量,因为批处理大小为 32,文本类别有 17 类。然而,BERT 模型只接受具有单维向量的标签。 因此,我收到错误:

预期输入 batch_size (32) 以匹配目标 batch_size (544)

544 是 32x17 的产物。但是,我的问题是如何使用一个热编码标签来获取每次迭代中的损失值?我可以只使用标签编码标签,但这并不适合无序标签。

# BERT training loop
for _ in trange(epochs, desc="Epoch"):  
  
  ## TRAINING
  
  # Set our model to training mode
  model.train()  
  # Tracking variables
  tr_loss = 0
  nb_tr_examples, nb_tr_steps = 0, 0
  # Train the data for one epoch
  for step, batch in enumerate(train_dataloader):
    # Add batch to GPU
    batch = tuple(t.to(device) for t in batch)
    # Unpack the inputs from our dataloader
    b_input_ids, b_input_mask, b_labels = batch
    # Clear out the gradients (by default they accumulate)
    optimizer.zero_grad()
    # Forward pass
    loss = model(b_input_ids, token_type_ids=None, attention_mask=b_input_mask, labels=b_labels)
    train_loss_set.append(loss.item())    
    # Backward pass
    loss.backward()
    # Update parameters and take a step using the computed gradient
    optimizer.step()
    # Update tracking variables
    tr_loss += loss.item()
    nb_tr_examples += b_input_ids.size(0)
    nb_tr_steps += 1
  print("Train loss: {}".format(tr_loss/nb_tr_steps))

如评论中所述,用于序列分类的 Bert 期望目标张量为 [batch] 大小的张量,其值跨越范围 [0, num_labels). one-hot 编码张量可以通过 argmax 在标签 dim 上进行转换,即 labels=b_labels.argmax(dim=1).