Pytorch 二进制分类 RNN 模型不学习
Pytorch Binary Classification RNN Model not Learning
我正在使用 Pytorch 进行二元分类任务,但我的模型无法学习,我无法确定是模型问题还是数据问题。
这是我的模型:
from torch import nn
class RNN(nn.Module):
def __init__(self, input_dim):
super(RNN, self).__init__()
self.rnn = nn.RNN(input_size=input_dim, hidden_size=64,
num_layers=2,
batch_first=True, bidirectional=True)
self.norm = nn.BatchNorm1d(128)
self.rnn2 = nn.RNN(input_size=128, hidden_size=64,
num_layers=2,
batch_first=True, bidirectional=False)
self.drop = nn.Dropout(0.5)
self.fc7 = nn.Linear(64, 2)
self.sigmoid2 = nn.Softmax(dim=2)
def forward(self, x):
out, h_n = self.rnn(x)
out = out.permute(0, 2, 1)
out = self.norm(out)
out = out.permute(0, 2, 1)
out, h_n = self.rnn2(out)
out = self.drop(out)
out = self.fc7(out)
out = self.sigmoid2(out)
return out.squeeze()
该模型由两个 RNN 层组成,中间有一个 BatchNorm,然后是一个 Dropout 和最后一层,我使用带有两个 类 的 Softmax 函数而不是 Sigmoid 来进行评估。
然后我创建并训练模型:
model = RNN(2476)
optimizer = torch.optim.Adam(model.parameters(), lr=1e-4)
loss_function = nn.CrossEntropyLoss()
lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=30, gamma=0.1)
model.train()
EPOCHS = 25
BATCH_SIZE = 64
epoch_loss = []
for ii in range(EPOCHS):
for i in range(1, X_train.size()[0]//BATCH_SIZE+1):
x_train = X_train[(i-1)*BATCH_SIZE:i*BATCH_SIZE]
labels = y_train[(i-1)*BATCH_SIZE:i*BATCH_SIZE]
optimizer.zero_grad()
y_pred = model(x_train)
y_pred = y_pred.round()
single_loss = loss_function(y_pred, labels.long().squeeze())
single_loss.backward()
optimizer.step()
lr_scheduler.step()
print(f"\rBatch {i}/{X_train.size()[0]//BATCH_SIZE+1} Trained: {i*BATCH_SIZE}/{X_train.size()[0]} Loss: {single_loss.item():10.8f} Step: {lr_scheduler.get_lr()}", end="")
epoch_loss.append(single_loss.item())
print(f'\nepoch: {ii:3} loss: {single_loss.item():10.8f}')
这是训练模型时的输出:
Batch 353/354 Trained: 22592/22644 Loss: 0.86013699 Step: [1.0000000000000007e-21]
epoch: 0 loss: 0.86013699
Batch 353/354 Trained: 22592/22644 Loss: 0.81326193 Step: [1.0000000000000014e-33]
epoch: 1 loss: 0.81326193
Batch 353/354 Trained: 22592/22644 Loss: 0.87576205 Step: [1.0000000000000022e-45]
epoch: 2 loss: 0.87576205
Batch 353/354 Trained: 22592/22644 Loss: 0.92263710 Step: [1.0000000000000026e-57]
epoch: 3 loss: 0.92263710
Batch 353/354 Trained: 22592/22644 Loss: 0.90701210 Step: [1.0000000000000034e-68]
epoch: 4 loss: 0.90701210
Batch 353/354 Trained: 22592/22644 Loss: 0.92263699 Step: [1.0000000000000039e-80]
epoch: 5 loss: 0.92263699
Batch 353/354 Trained: 22592/22644 Loss: 0.82888693 Step: [1.0000000000000044e-92]
epoch: 6 loss: 0.82888693
Batch 353/354 Trained: 22592/22644 Loss: 0.81326193 Step: [1.000000000000005e-104]]
epoch: 7 loss: 0.81326193
Batch 353/354 Trained: 22592/22644 Loss: 0.87576205 Step: [1.0000000000000055e-115]
epoch: 8 loss: 0.87576205
Batch 353/354 Trained: 22592/22644 Loss: 0.82888693 Step: [1.0000000000000062e-127]
epoch: 9 loss: 0.82888693
Batch 353/354 Trained: 22592/22644 Loss: 0.81326199 Step: [1.0000000000000067e-139]
epoch: 10 loss: 0.81326199
Batch 353/354 Trained: 22592/22644 Loss: 0.82888693 Step: [1.0000000000000072e-151]
epoch: 11 loss: 0.82888693
Batch 353/354 Trained: 22592/22644 Loss: 0.89138699 Step: [1.0000000000000076e-162]
epoch: 12 loss: 0.89138699
Batch 353/354 Trained: 22592/22644 Loss: 0.82888699 Step: [1.000000000000008e-174]]
epoch: 13 loss: 0.82888699
Batch 353/354 Trained: 22592/22644 Loss: 0.82888687 Step: [1.0000000000000089e-186]
epoch: 14 loss: 0.82888687
Batch 353/354 Trained: 22592/22644 Loss: 0.82888693 Step: [1.0000000000000096e-198]
epoch: 15 loss: 0.82888693
Batch 353/354 Trained: 22592/22644 Loss: 0.84451199 Step: [1.0000000000000103e-210]
epoch: 16 loss: 0.84451199
Batch 353/354 Trained: 22592/22644 Loss: 0.96951205 Step: [1.0000000000000111e-221]
epoch: 17 loss: 0.96951205
Batch 353/354 Trained: 22592/22644 Loss: 0.87576205 Step: [1.0000000000000117e-233]
epoch: 18 loss: 0.87576205
Batch 353/354 Trained: 22592/22644 Loss: 0.89138705 Step: [1.0000000000000125e-245]
epoch: 19 loss: 0.89138705
Batch 353/354 Trained: 22592/22644 Loss: 0.79763699 Step: [1.0000000000000133e-257]
epoch: 20 loss: 0.79763699
Batch 353/354 Trained: 22592/22644 Loss: 0.84451199 Step: [1.0000000000000138e-268]
epoch: 21 loss: 0.84451199
Batch 353/354 Trained: 22592/22644 Loss: 0.84451205 Step: [1.0000000000000146e-280]
epoch: 22 loss: 0.84451205
Batch 353/354 Trained: 22592/22644 Loss: 0.79763693 Step: [1.0000000000000153e-292]
epoch: 23 loss: 0.79763693
Batch 353/354 Trained: 22592/22644 Loss: 0.87576205 Step: [1.000000000000016e-304]]
epoch: 24 loss: 0.87576205
这是每个时期的损失:
对于数据,输入数据中的每个特征都有一个维度(2474,)
,目标有1个维度([1]
或[0]
),那么我将 sequence length
维度 (1
) 添加到 RNN
层的输入数据中:
X_train.size(), X_test.size(), y_train.size(), y_test.size()
(torch.Size([22644, 1, 2474]),
torch.Size([5661, 1, 2474]),
torch.Size([22644, 1]),
torch.Size([5661, 1]))
目标分布类:
我不明白为什么我的模型没有学习,类 是平衡的,我没有通知数据有任何错误。有什么建议吗?
这不是您问题的直接解决方案,但导致此架构的过程是什么?我发现如果只是为了让识别问题变得更加微不足道(我在问题出现之前添加了什么?),迭代地增加复杂性是有帮助的。
为了节省迭代构建 RNN 的时间,您可以尝试单批次训练,通过这种训练,您可以构建一个可以过拟合单个训练批次的网络。如果您的网络可以过度拟合单个训练批次,它应该足够复杂以学习训练数据中的特征。
一旦您拥有可以轻松过度拟合单个训练批次的架构,您就可以使用整个训练集进行训练,并探索其他策略以通过正则化来解决过度拟合问题。
您的模型似乎并不过分复杂,但这可能意味着从单个 rnn 层和单个线性层开始,看看您的损失是否会在单个批次上发生变化。
我正在使用 Pytorch 进行二元分类任务,但我的模型无法学习,我无法确定是模型问题还是数据问题。
这是我的模型:
from torch import nn
class RNN(nn.Module):
def __init__(self, input_dim):
super(RNN, self).__init__()
self.rnn = nn.RNN(input_size=input_dim, hidden_size=64,
num_layers=2,
batch_first=True, bidirectional=True)
self.norm = nn.BatchNorm1d(128)
self.rnn2 = nn.RNN(input_size=128, hidden_size=64,
num_layers=2,
batch_first=True, bidirectional=False)
self.drop = nn.Dropout(0.5)
self.fc7 = nn.Linear(64, 2)
self.sigmoid2 = nn.Softmax(dim=2)
def forward(self, x):
out, h_n = self.rnn(x)
out = out.permute(0, 2, 1)
out = self.norm(out)
out = out.permute(0, 2, 1)
out, h_n = self.rnn2(out)
out = self.drop(out)
out = self.fc7(out)
out = self.sigmoid2(out)
return out.squeeze()
该模型由两个 RNN 层组成,中间有一个 BatchNorm,然后是一个 Dropout 和最后一层,我使用带有两个 类 的 Softmax 函数而不是 Sigmoid 来进行评估。
然后我创建并训练模型:
model = RNN(2476)
optimizer = torch.optim.Adam(model.parameters(), lr=1e-4)
loss_function = nn.CrossEntropyLoss()
lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=30, gamma=0.1)
model.train()
EPOCHS = 25
BATCH_SIZE = 64
epoch_loss = []
for ii in range(EPOCHS):
for i in range(1, X_train.size()[0]//BATCH_SIZE+1):
x_train = X_train[(i-1)*BATCH_SIZE:i*BATCH_SIZE]
labels = y_train[(i-1)*BATCH_SIZE:i*BATCH_SIZE]
optimizer.zero_grad()
y_pred = model(x_train)
y_pred = y_pred.round()
single_loss = loss_function(y_pred, labels.long().squeeze())
single_loss.backward()
optimizer.step()
lr_scheduler.step()
print(f"\rBatch {i}/{X_train.size()[0]//BATCH_SIZE+1} Trained: {i*BATCH_SIZE}/{X_train.size()[0]} Loss: {single_loss.item():10.8f} Step: {lr_scheduler.get_lr()}", end="")
epoch_loss.append(single_loss.item())
print(f'\nepoch: {ii:3} loss: {single_loss.item():10.8f}')
这是训练模型时的输出:
Batch 353/354 Trained: 22592/22644 Loss: 0.86013699 Step: [1.0000000000000007e-21]
epoch: 0 loss: 0.86013699
Batch 353/354 Trained: 22592/22644 Loss: 0.81326193 Step: [1.0000000000000014e-33]
epoch: 1 loss: 0.81326193
Batch 353/354 Trained: 22592/22644 Loss: 0.87576205 Step: [1.0000000000000022e-45]
epoch: 2 loss: 0.87576205
Batch 353/354 Trained: 22592/22644 Loss: 0.92263710 Step: [1.0000000000000026e-57]
epoch: 3 loss: 0.92263710
Batch 353/354 Trained: 22592/22644 Loss: 0.90701210 Step: [1.0000000000000034e-68]
epoch: 4 loss: 0.90701210
Batch 353/354 Trained: 22592/22644 Loss: 0.92263699 Step: [1.0000000000000039e-80]
epoch: 5 loss: 0.92263699
Batch 353/354 Trained: 22592/22644 Loss: 0.82888693 Step: [1.0000000000000044e-92]
epoch: 6 loss: 0.82888693
Batch 353/354 Trained: 22592/22644 Loss: 0.81326193 Step: [1.000000000000005e-104]]
epoch: 7 loss: 0.81326193
Batch 353/354 Trained: 22592/22644 Loss: 0.87576205 Step: [1.0000000000000055e-115]
epoch: 8 loss: 0.87576205
Batch 353/354 Trained: 22592/22644 Loss: 0.82888693 Step: [1.0000000000000062e-127]
epoch: 9 loss: 0.82888693
Batch 353/354 Trained: 22592/22644 Loss: 0.81326199 Step: [1.0000000000000067e-139]
epoch: 10 loss: 0.81326199
Batch 353/354 Trained: 22592/22644 Loss: 0.82888693 Step: [1.0000000000000072e-151]
epoch: 11 loss: 0.82888693
Batch 353/354 Trained: 22592/22644 Loss: 0.89138699 Step: [1.0000000000000076e-162]
epoch: 12 loss: 0.89138699
Batch 353/354 Trained: 22592/22644 Loss: 0.82888699 Step: [1.000000000000008e-174]]
epoch: 13 loss: 0.82888699
Batch 353/354 Trained: 22592/22644 Loss: 0.82888687 Step: [1.0000000000000089e-186]
epoch: 14 loss: 0.82888687
Batch 353/354 Trained: 22592/22644 Loss: 0.82888693 Step: [1.0000000000000096e-198]
epoch: 15 loss: 0.82888693
Batch 353/354 Trained: 22592/22644 Loss: 0.84451199 Step: [1.0000000000000103e-210]
epoch: 16 loss: 0.84451199
Batch 353/354 Trained: 22592/22644 Loss: 0.96951205 Step: [1.0000000000000111e-221]
epoch: 17 loss: 0.96951205
Batch 353/354 Trained: 22592/22644 Loss: 0.87576205 Step: [1.0000000000000117e-233]
epoch: 18 loss: 0.87576205
Batch 353/354 Trained: 22592/22644 Loss: 0.89138705 Step: [1.0000000000000125e-245]
epoch: 19 loss: 0.89138705
Batch 353/354 Trained: 22592/22644 Loss: 0.79763699 Step: [1.0000000000000133e-257]
epoch: 20 loss: 0.79763699
Batch 353/354 Trained: 22592/22644 Loss: 0.84451199 Step: [1.0000000000000138e-268]
epoch: 21 loss: 0.84451199
Batch 353/354 Trained: 22592/22644 Loss: 0.84451205 Step: [1.0000000000000146e-280]
epoch: 22 loss: 0.84451205
Batch 353/354 Trained: 22592/22644 Loss: 0.79763693 Step: [1.0000000000000153e-292]
epoch: 23 loss: 0.79763693
Batch 353/354 Trained: 22592/22644 Loss: 0.87576205 Step: [1.000000000000016e-304]]
epoch: 24 loss: 0.87576205
这是每个时期的损失:
对于数据,输入数据中的每个特征都有一个维度(2474,)
,目标有1个维度([1]
或[0]
),那么我将 sequence length
维度 (1
) 添加到 RNN
层的输入数据中:
X_train.size(), X_test.size(), y_train.size(), y_test.size()
(torch.Size([22644, 1, 2474]),
torch.Size([5661, 1, 2474]),
torch.Size([22644, 1]),
torch.Size([5661, 1]))
目标分布类:
我不明白为什么我的模型没有学习,类 是平衡的,我没有通知数据有任何错误。有什么建议吗?
这不是您问题的直接解决方案,但导致此架构的过程是什么?我发现如果只是为了让识别问题变得更加微不足道(我在问题出现之前添加了什么?),迭代地增加复杂性是有帮助的。
为了节省迭代构建 RNN 的时间,您可以尝试单批次训练,通过这种训练,您可以构建一个可以过拟合单个训练批次的网络。如果您的网络可以过度拟合单个训练批次,它应该足够复杂以学习训练数据中的特征。
一旦您拥有可以轻松过度拟合单个训练批次的架构,您就可以使用整个训练集进行训练,并探索其他策略以通过正则化来解决过度拟合问题。
您的模型似乎并不过分复杂,但这可能意味着从单个 rnn 层和单个线性层开始,看看您的损失是否会在单个批次上发生变化。