来自一个热编码标签的 BERT 模型损失函数
BERT model loss function from one hot encoded labels
对于行:loss = model(b_input_ids, token_type_ids=None, attention_mask=b_input_mask, labels=b_labels )
我对标签进行热编码,使其成为 32x17 的张量,因为批处理大小为 32,文本类别有 17 类。然而,BERT 模型只接受具有单维向量的标签。
因此,我收到错误:
预期输入 batch_size (32) 以匹配目标 batch_size (544)
544 是 32x17 的产物。但是,我的问题是如何使用一个热编码标签来获取每次迭代中的损失值?我可以只使用标签编码标签,但这并不适合无序标签。
# BERT training loop
for _ in trange(epochs, desc="Epoch"):
## TRAINING
# Set our model to training mode
model.train()
# Tracking variables
tr_loss = 0
nb_tr_examples, nb_tr_steps = 0, 0
# Train the data for one epoch
for step, batch in enumerate(train_dataloader):
# Add batch to GPU
batch = tuple(t.to(device) for t in batch)
# Unpack the inputs from our dataloader
b_input_ids, b_input_mask, b_labels = batch
# Clear out the gradients (by default they accumulate)
optimizer.zero_grad()
# Forward pass
loss = model(b_input_ids, token_type_ids=None, attention_mask=b_input_mask, labels=b_labels)
train_loss_set.append(loss.item())
# Backward pass
loss.backward()
# Update parameters and take a step using the computed gradient
optimizer.step()
# Update tracking variables
tr_loss += loss.item()
nb_tr_examples += b_input_ids.size(0)
nb_tr_steps += 1
print("Train loss: {}".format(tr_loss/nb_tr_steps))
如评论中所述,用于序列分类的 Bert 期望目标张量为 [batch]
大小的张量,其值跨越范围 [0, num_labels). one-hot 编码张量可以通过 argmax
在标签 dim 上进行转换,即 labels=b_labels.argmax(dim=1)
.
对于行:loss = model(b_input_ids, token_type_ids=None, attention_mask=b_input_mask, labels=b_labels ) 我对标签进行热编码,使其成为 32x17 的张量,因为批处理大小为 32,文本类别有 17 类。然而,BERT 模型只接受具有单维向量的标签。 因此,我收到错误:
预期输入 batch_size (32) 以匹配目标 batch_size (544)
544 是 32x17 的产物。但是,我的问题是如何使用一个热编码标签来获取每次迭代中的损失值?我可以只使用标签编码标签,但这并不适合无序标签。
# BERT training loop
for _ in trange(epochs, desc="Epoch"):
## TRAINING
# Set our model to training mode
model.train()
# Tracking variables
tr_loss = 0
nb_tr_examples, nb_tr_steps = 0, 0
# Train the data for one epoch
for step, batch in enumerate(train_dataloader):
# Add batch to GPU
batch = tuple(t.to(device) for t in batch)
# Unpack the inputs from our dataloader
b_input_ids, b_input_mask, b_labels = batch
# Clear out the gradients (by default they accumulate)
optimizer.zero_grad()
# Forward pass
loss = model(b_input_ids, token_type_ids=None, attention_mask=b_input_mask, labels=b_labels)
train_loss_set.append(loss.item())
# Backward pass
loss.backward()
# Update parameters and take a step using the computed gradient
optimizer.step()
# Update tracking variables
tr_loss += loss.item()
nb_tr_examples += b_input_ids.size(0)
nb_tr_steps += 1
print("Train loss: {}".format(tr_loss/nb_tr_steps))
如评论中所述,用于序列分类的 Bert 期望目标张量为 [batch]
大小的张量,其值跨越范围 [0, num_labels). one-hot 编码张量可以通过 argmax
在标签 dim 上进行转换,即 labels=b_labels.argmax(dim=1)
.