持续训练损失和验证损失
Constant Training Loss and Validation Loss
我正在 运行使用 Pytorch 库构建一个 RNN 模型来对电影评论进行情感分析,但在整个训练过程中,训练损失和验证损失不知何故保持不变。我查找了不同的在线资源,但仍然卡住了。
有人可以帮忙看看我的代码吗?
部分参数由赋值指定:
embedding_dim = 64
n_layers = 1
n_hidden = 128
dropout = 0.5
batch_size = 32
我的主要代码
txt_field = data.Field(tokenize=word_tokenize, lower=True, include_lengths=True, batch_first=True)
label_field = data.Field(sequential=False, use_vocab=False, batch_first=True)
train = data.TabularDataset(path=part2_filepath+"train_Copy.csv", format='csv',
fields=[('label', label_field), ('text', txt_field)], skip_header=True)
validation = data.TabularDataset(path=part2_filepath+"validation_Copy.csv", format='csv',
fields=[('label', label_field), ('text', txt_field)], skip_header=True)
txt_field.build_vocab(train, min_freq=5)
label_field.build_vocab(train, min_freq=2)
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
train_iter, valid_iter, test_iter = data.BucketIterator.splits(
(train, validation, test),
batch_size=32,
sort_key=lambda x: len(x.text),
sort_within_batch=True,
device=device)
n_vocab = len(txt_field.vocab)
embedding_dim = 64
n_hidden = 128
n_layers = 1
dropout = 0.5
model = Text_RNN(n_vocab, embedding_dim, n_hidden, n_layers, dropout)
optimizer = torch.optim.Adam(model.parameters(), lr=0.0001)
criterion = torch.nn.BCELoss().to(device)
N_EPOCHS = 15
best_valid_loss = float('inf')
for epoch in range(N_EPOCHS):
train_loss, train_acc = RNN_train(model, train_iter, optimizer, criterion)
valid_loss, valid_acc = evaluate(model, valid_iter, criterion)
我的模型
class Text_RNN(nn.Module):
def __init__(self, n_vocab, embedding_dim, n_hidden, n_layers, dropout):
super(Text_RNN, self).__init__()
self.n_layers = n_layers
self.n_hidden = n_hidden
self.emb = nn.Embedding(n_vocab, embedding_dim)
self.rnn = nn.RNN(
input_size=embedding_dim,
hidden_size=n_hidden,
num_layers=n_layers,
dropout=dropout,
batch_first=True
)
self.sigmoid = nn.Sigmoid()
self.linear = nn.Linear(n_hidden, 2)
def forward(self, sent, sent_len):
sent_emb = self.emb(sent)
outputs, hidden = self.rnn(sent_emb)
prob = self.sigmoid(self.linear(hidden.squeeze(0)))
return prob
训练函数
def RNN_train(model, iterator, optimizer, criterion):
epoch_loss = 0
epoch_acc = 0
model.train()
for batch in iterator:
text, text_lengths = batch.text
predictions = model(text, text_lengths)
batch.label = batch.label.type(torch.FloatTensor).squeeze()
predictions = torch.max(predictions.data, 1).indices.type(torch.FloatTensor)
loss = criterion(predictions, batch.label)
loss.requires_grad = True
acc = binary_accuracy(predictions, batch.label)
optimizer.zero_grad()
loss.backward()
optimizer.step()
epoch_loss += loss.item()
epoch_acc += acc.item()
return epoch_loss / len(iterator), epoch_acc / len(iterator)
10 条测试评论 + 5 条验证评论运行 的输出
Epoch [1/15]: Train Loss: 15.351 | Train Acc: 44.44% Val. Loss: 11.052 | Val. Acc: 60.00%
Epoch [2/15]: Train Loss: 15.351 | Train Acc: 44.44% Val. Loss: 11.052 | Val. Acc: 60.00%
Epoch [3/15]: Train Loss: 15.351 | Train Acc: 44.44% Val. Loss: 11.052 | Val. Acc: 60.00%
Epoch [4/15]: Train Loss: 15.351 | Train Acc: 44.44% Val. Loss: 11.052 | Val. Acc: 60.00%
...
如果有人能给我指明正确的方向,我将不胜感激,我相信这与培训代码有关,因为对于大部分内容,我都遵循这篇文章:
https://www.analyticsvidhya.com/blog/2020/01/first-text-classification-in-pytorch/
在你的训练循环中,你使用的是来自最大操作的索引,它是不可微分的,所以你无法通过它跟踪梯度。因为它是不可微分的,所以之后的一切也不跟踪梯度。呼唤
loss.backward()
会失败。
# The indices of the max operation are not differentiable
predictions = torch.max(predictions.data, 1).indices.type(torch.FloatTensor)
loss = criterion(predictions, batch.label)
# Setting requires_grad to True to make .backward() work, although incorrectly.
loss.requires_grad = True
大概你想通过设置 requires_grad
来解决这个问题,但这并没有达到你的预期,因为没有梯度传播到你的模型,因为你的计算图中唯一的东西就是损失本身,而且无处可去。
您使用索引得到 0 或 1,因为您的模型的输出本质上是两个 class,而您想要概率较高的一个。对于二元交叉熵损失,您只需要一个 class 值介于 0 和 1 之间(连续),这是通过应用 sigmoid 函数获得的。
所以你需要将最后一个线性层的输出通道改为1:
self.linear = nn.Linear(n_hidden, 1)
并且在您的训练循环中,您可以删除 torch.max
调用以及 requires_grad
。
# Squeeze the model's output to get rid of the single class dimension
predictions = model(text, text_lengths).squeeze()
batch.label = batch.label.type(torch.FloatTensor).squeeze()
loss = criterion(predictions, batch.label)
acc = binary_accuracy(predictions, batch.label)
optimizer.zero_grad()
loss.backward()
因为你最后只有 1 class,实际的预测要么是 0 要么是 1(中间没有任何东西),为了实现这个你可以简单地使用 0.5 作为阈值,所以下面的一切都是被认为是 0,上面的所有内容都被认为是 1。如果您正在使用您正在关注的文章的 binary_accuracy
功能,则会自动为您完成。他们通过用 torch.round
.
四舍五入来做到这一点
我正在 运行使用 Pytorch 库构建一个 RNN 模型来对电影评论进行情感分析,但在整个训练过程中,训练损失和验证损失不知何故保持不变。我查找了不同的在线资源,但仍然卡住了。
有人可以帮忙看看我的代码吗?
部分参数由赋值指定:
embedding_dim = 64
n_layers = 1
n_hidden = 128
dropout = 0.5
batch_size = 32
我的主要代码
txt_field = data.Field(tokenize=word_tokenize, lower=True, include_lengths=True, batch_first=True)
label_field = data.Field(sequential=False, use_vocab=False, batch_first=True)
train = data.TabularDataset(path=part2_filepath+"train_Copy.csv", format='csv',
fields=[('label', label_field), ('text', txt_field)], skip_header=True)
validation = data.TabularDataset(path=part2_filepath+"validation_Copy.csv", format='csv',
fields=[('label', label_field), ('text', txt_field)], skip_header=True)
txt_field.build_vocab(train, min_freq=5)
label_field.build_vocab(train, min_freq=2)
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
train_iter, valid_iter, test_iter = data.BucketIterator.splits(
(train, validation, test),
batch_size=32,
sort_key=lambda x: len(x.text),
sort_within_batch=True,
device=device)
n_vocab = len(txt_field.vocab)
embedding_dim = 64
n_hidden = 128
n_layers = 1
dropout = 0.5
model = Text_RNN(n_vocab, embedding_dim, n_hidden, n_layers, dropout)
optimizer = torch.optim.Adam(model.parameters(), lr=0.0001)
criterion = torch.nn.BCELoss().to(device)
N_EPOCHS = 15
best_valid_loss = float('inf')
for epoch in range(N_EPOCHS):
train_loss, train_acc = RNN_train(model, train_iter, optimizer, criterion)
valid_loss, valid_acc = evaluate(model, valid_iter, criterion)
我的模型
class Text_RNN(nn.Module):
def __init__(self, n_vocab, embedding_dim, n_hidden, n_layers, dropout):
super(Text_RNN, self).__init__()
self.n_layers = n_layers
self.n_hidden = n_hidden
self.emb = nn.Embedding(n_vocab, embedding_dim)
self.rnn = nn.RNN(
input_size=embedding_dim,
hidden_size=n_hidden,
num_layers=n_layers,
dropout=dropout,
batch_first=True
)
self.sigmoid = nn.Sigmoid()
self.linear = nn.Linear(n_hidden, 2)
def forward(self, sent, sent_len):
sent_emb = self.emb(sent)
outputs, hidden = self.rnn(sent_emb)
prob = self.sigmoid(self.linear(hidden.squeeze(0)))
return prob
训练函数
def RNN_train(model, iterator, optimizer, criterion):
epoch_loss = 0
epoch_acc = 0
model.train()
for batch in iterator:
text, text_lengths = batch.text
predictions = model(text, text_lengths)
batch.label = batch.label.type(torch.FloatTensor).squeeze()
predictions = torch.max(predictions.data, 1).indices.type(torch.FloatTensor)
loss = criterion(predictions, batch.label)
loss.requires_grad = True
acc = binary_accuracy(predictions, batch.label)
optimizer.zero_grad()
loss.backward()
optimizer.step()
epoch_loss += loss.item()
epoch_acc += acc.item()
return epoch_loss / len(iterator), epoch_acc / len(iterator)
10 条测试评论 + 5 条验证评论运行 的输出
Epoch [1/15]: Train Loss: 15.351 | Train Acc: 44.44% Val. Loss: 11.052 | Val. Acc: 60.00%
Epoch [2/15]: Train Loss: 15.351 | Train Acc: 44.44% Val. Loss: 11.052 | Val. Acc: 60.00%
Epoch [3/15]: Train Loss: 15.351 | Train Acc: 44.44% Val. Loss: 11.052 | Val. Acc: 60.00%
Epoch [4/15]: Train Loss: 15.351 | Train Acc: 44.44% Val. Loss: 11.052 | Val. Acc: 60.00%
...
如果有人能给我指明正确的方向,我将不胜感激,我相信这与培训代码有关,因为对于大部分内容,我都遵循这篇文章: https://www.analyticsvidhya.com/blog/2020/01/first-text-classification-in-pytorch/
在你的训练循环中,你使用的是来自最大操作的索引,它是不可微分的,所以你无法通过它跟踪梯度。因为它是不可微分的,所以之后的一切也不跟踪梯度。呼唤
loss.backward()
会失败。
# The indices of the max operation are not differentiable
predictions = torch.max(predictions.data, 1).indices.type(torch.FloatTensor)
loss = criterion(predictions, batch.label)
# Setting requires_grad to True to make .backward() work, although incorrectly.
loss.requires_grad = True
大概你想通过设置 requires_grad
来解决这个问题,但这并没有达到你的预期,因为没有梯度传播到你的模型,因为你的计算图中唯一的东西就是损失本身,而且无处可去。
您使用索引得到 0 或 1,因为您的模型的输出本质上是两个 class,而您想要概率较高的一个。对于二元交叉熵损失,您只需要一个 class 值介于 0 和 1 之间(连续),这是通过应用 sigmoid 函数获得的。
所以你需要将最后一个线性层的输出通道改为1:
self.linear = nn.Linear(n_hidden, 1)
并且在您的训练循环中,您可以删除 torch.max
调用以及 requires_grad
。
# Squeeze the model's output to get rid of the single class dimension
predictions = model(text, text_lengths).squeeze()
batch.label = batch.label.type(torch.FloatTensor).squeeze()
loss = criterion(predictions, batch.label)
acc = binary_accuracy(predictions, batch.label)
optimizer.zero_grad()
loss.backward()
因为你最后只有 1 class,实际的预测要么是 0 要么是 1(中间没有任何东西),为了实现这个你可以简单地使用 0.5 作为阈值,所以下面的一切都是被认为是 0,上面的所有内容都被认为是 1。如果您正在使用您正在关注的文章的 binary_accuracy
功能,则会自动为您完成。他们通过用 torch.round
.