持续训练损失和验证损失

Constant Training Loss and Validation Loss

我正在 运行使用 Pytorch 库构建一个 RNN 模型来对电影评论进行情感分析,但在整个训练过程中,训练损失和验证损失不知何故保持不变。我查找了不同的在线资源,但仍然卡住了。

有人可以帮忙看看我的代码吗?

部分参数由赋值指定:

embedding_dim = 64

n_layers = 1

n_hidden = 128

dropout = 0.5

batch_size = 32

我的主要代码

txt_field = data.Field(tokenize=word_tokenize, lower=True, include_lengths=True, batch_first=True)
label_field = data.Field(sequential=False, use_vocab=False, batch_first=True)

train = data.TabularDataset(path=part2_filepath+"train_Copy.csv", format='csv',
                            fields=[('label', label_field), ('text', txt_field)], skip_header=True)
validation = data.TabularDataset(path=part2_filepath+"validation_Copy.csv", format='csv',
                            fields=[('label', label_field), ('text', txt_field)], skip_header=True)

txt_field.build_vocab(train, min_freq=5)
label_field.build_vocab(train, min_freq=2)

device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
train_iter, valid_iter, test_iter = data.BucketIterator.splits(
    (train, validation, test),
    batch_size=32,
    sort_key=lambda x: len(x.text),
    sort_within_batch=True,
    device=device)

n_vocab = len(txt_field.vocab)
embedding_dim = 64
n_hidden = 128
n_layers = 1
dropout = 0.5

model = Text_RNN(n_vocab, embedding_dim, n_hidden, n_layers, dropout)

optimizer = torch.optim.Adam(model.parameters(), lr=0.0001)
criterion = torch.nn.BCELoss().to(device)

N_EPOCHS = 15
best_valid_loss = float('inf')

for epoch in range(N_EPOCHS):
    train_loss, train_acc = RNN_train(model, train_iter, optimizer, criterion)
    valid_loss, valid_acc = evaluate(model, valid_iter, criterion)

我的模型

class Text_RNN(nn.Module):
    def __init__(self, n_vocab, embedding_dim, n_hidden, n_layers, dropout):
        super(Text_RNN, self).__init__()
        self.n_layers = n_layers
        self.n_hidden = n_hidden
        self.emb = nn.Embedding(n_vocab, embedding_dim)
        self.rnn = nn.RNN(
            input_size=embedding_dim,
            hidden_size=n_hidden,
            num_layers=n_layers,
            dropout=dropout,
            batch_first=True
        )
        self.sigmoid = nn.Sigmoid()
        self.linear = nn.Linear(n_hidden, 2)

    def forward(self, sent, sent_len):
        sent_emb = self.emb(sent)
        outputs, hidden = self.rnn(sent_emb)
        prob = self.sigmoid(self.linear(hidden.squeeze(0)))

        return prob

训练函数

def RNN_train(model, iterator, optimizer, criterion):
    epoch_loss = 0
    epoch_acc = 0
    model.train()
    for batch in iterator:
        text, text_lengths = batch.text
        predictions = model(text, text_lengths)
        batch.label = batch.label.type(torch.FloatTensor).squeeze()
        predictions = torch.max(predictions.data, 1).indices.type(torch.FloatTensor)
        loss = criterion(predictions, batch.label)
        loss.requires_grad = True
        acc = binary_accuracy(predictions, batch.label)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        epoch_loss += loss.item()
        epoch_acc += acc.item()

    return epoch_loss / len(iterator), epoch_acc / len(iterator)

10 条测试评论 + 5 条验证评论运行 的输出

Epoch [1/15]:   Train Loss: 15.351 | Train Acc: 44.44%  Val. Loss: 11.052 |  Val. Acc: 60.00%
Epoch [2/15]:   Train Loss: 15.351 | Train Acc: 44.44%  Val. Loss: 11.052 |  Val. Acc: 60.00%
Epoch [3/15]:   Train Loss: 15.351 | Train Acc: 44.44%  Val. Loss: 11.052 |  Val. Acc: 60.00%
Epoch [4/15]:   Train Loss: 15.351 | Train Acc: 44.44%  Val. Loss: 11.052 |  Val. Acc: 60.00%
...

如果有人能给我指明正确的方向,我将不胜感激,我相信这与培训代码有关,因为对于大部分内容,我都遵循这篇文章: https://www.analyticsvidhya.com/blog/2020/01/first-text-classification-in-pytorch/

在你的训练循环中,你使用的是来自最大操作的索引,它是不可微分的,所以你无法通过它跟踪梯度。因为它是不可微分的,所以之后的一切也不跟踪梯度。呼唤 loss.backward() 会失败。

# The indices of the max operation are not differentiable
predictions = torch.max(predictions.data, 1).indices.type(torch.FloatTensor)
loss = criterion(predictions, batch.label)
# Setting requires_grad to True to make .backward() work, although incorrectly.
loss.requires_grad = True

大概你想通过设置 requires_grad 来解决这个问题,但这并没有达到你的预期,因为没有梯度传播到你的模型,因为你的计算图中唯一的东西就是损失本身,而且无处可去。

您使用索引得到 0 或 1,因为您的模型的输出本质上是两个 class,而您想要概率较高的一个。对于二元交叉熵损失,您只需要一个 class 值介于 0 和 1 之间(连续),这是通过应用 sigmoid 函数获得的。

所以你需要将最后一个线性层的输出通道改为1:

self.linear = nn.Linear(n_hidden, 1)

并且在您的训练循环中,您可以删除 torch.max 调用以及 requires_grad

# Squeeze the model's output to get rid of the single class dimension
predictions = model(text, text_lengths).squeeze()
batch.label = batch.label.type(torch.FloatTensor).squeeze()
loss = criterion(predictions, batch.label)
acc = binary_accuracy(predictions, batch.label)
optimizer.zero_grad()
loss.backward()

因为你最后只有 1 class,实际的预测要么是 0 要么是 1(中间没有任何东西),为了实现这个你可以简单地使用 0.5 作为阈值,所以下面的一切都是被认为是 0,上面的所有内容都被认为是 1。如果您正在使用您正在关注的文章的 binary_accuracy 功能,则会自动为您完成。他们通过用 torch.round.

四舍五入来做到这一点