难以在 2 层 RNN 中正确设置批量大小
Difficulty setting batch size correctltly in 2 layer RNN
我正在构建一个 RNN,它为输出中的 11 个维度生成多 class class 化输出。输入是我从预训练手套模型中提取的词嵌入。
我得到的错误是(问题末尾的完整回溯):
ValueError: Expected input batch_size (1) to match target batch_size (11).
请注意,我在这里使用 batch_size=1
,并且错误显示“预期批大小 1 以匹配目标 batch_size (11)”。但是,如果我将批量大小更改为 11,则错误更改为:
ValueError: Expected input batch_size (11) to match target batch_size (121).
我认为错误来自 text
的形状 torch.Size([11, 300])
,它缺少序列长度,但我认为如果我不分配序列长度,它默认为1. 但是不知道怎么加进去
训练循环:
def train(model, device, train_loader, valid_loader, epochs, learning_rate):
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
train_loss, validation_loss = [], []
train_acc, validation_acc = [], []
for epoch in range(epochs):
#train
model.train()
running_loss = 0.
correct, total = 0, 0
steps = 0
for idx, batch in enumerate(train_loader):
text = batch["Sample"].to(device)
target = batch['Class'].to(device)
print(text.shape, target.shape)
text, target = text.to(device), target.to(device)
# add micro for coding training loop
optimizer.zero_grad()
print(text.shape)
output, hidden = model(text.unsqueeze(1))
#print(output.shape, target.shape, target.view(-1).shape)
loss = criterion(output, target.view(-1))
loss.backward()
optimizer.step()
steps += 1
running_loss += loss.item()
# get accuracy
_, predicted = torch.max(output, 1)
print(predicted)
#predicted = torch.round(output.squeeze())
total += target.size(0)
correct += (predicted == target).sum().item()
train_loss.append(running_loss/len(train_loader))
train_acc.append(correct/total)
print(f'Epoch: {epoch + 1}, '
f'Training Loss: {running_loss/len(train_loader):.4f}, '
f'Training Accuracy: {100*correct/total: .2f}%')
# evaluate on validation data
model.eval()
running_loss = 0.
correct, total = 0, 0
with torch.no_grad():
for idx, batch in enumerate(valid_loader):
text = batch["Sample"].to(device)
print(type(text), text.shape)
target = batch['Class'].to(device)
target = torch.autograd.Variable(target).long()
text, target = text.to(device), target.to(device)
optimizer.zero_grad()
output = model(text)
loss = criterion(output, target)
running_loss += loss.item()
# get accuracy
_, predicted = torch.max(output, 1)
#predicted = torch.round(output.squeeze())
total += target.size(0)
correct += (predicted == target).sum().item()
validation_loss.append(running_loss/len(valid_loader))
validation_acc.append(correct/total)
print (f'Validation Loss: {running_loss/len(valid_loader):.4f}, '
f'Validation Accuracy: {100*correct/total: .2f}%')
return train_loss, train_acc, validation_loss, validation_acc
这就是我对训练循环的称呼:
# Model hyperparamters
#vocab_size = len(word_array)
learning_rate = 1e-3
hidden_dim = 100
output_size = 11
input_size = 300
epochs = 10
n_layers = 2
# Initialize model, training and testing
set_seed(SEED)
vanilla_rnn_model = VanillaRNN(input_size, output_size, hidden_dim, n_layers)
vanilla_rnn_model.to(DEVICE)
vanilla_rnn_start_time = time.time()
vanilla_train_loss, vanilla_train_acc, vanilla_validation_loss, vanilla_validation_acc = train(vanilla_rnn_model,
DEVICE,
train_loader,
valid_loader,
epochs = epochs,
learning_rate = learning_rate)
这就是我创建数据加载器的方式:
# Splitting dataset
# define a batch_size, I'll use 4 as an example
batch_size = 1
train_dset = CustomDataset(X2, y) # create data set
train_loader = DataLoader(train_dset, batch_size=batch_size, shuffle=True) #load data with batch size
valid_dset = CustomDataset(X2, y)
valid_loader = DataLoader(valid_dset, batch_size=batch_size, shuffle=True)
g_seed = torch.Generator()
g_seed.manual_seed(SEED)
完整追溯:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-23-bfd2f8f3456f> in <module>()
19 valid_loader,
20 epochs = epochs,
---> 21 learning_rate = learning_rate)
22 print("--- Time taken to train = %s seconds ---" % (time.time() - vanilla_rnn_start_time))
23 #test_accuracy = test(vanilla_rnn_model, DEVICE, test_iter)
3 frames
<ipython-input-22-16748701034f> in train(model, device, train_loader, valid_loader, epochs, learning_rate)
47 output, hidden = model(text.unsqueeze(1))
48 #print(output.shape, target.shape, target.view(-1).shape)
---> 49 loss = criterion(output, target.view(-1))
50 loss.backward()
51 optimizer.step()
/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
1049 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1050 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1051 return forward_call(*input, **kwargs)
1052 # Do not call functions when jit is used
1053 full_backward_hooks, non_full_backward_hooks = [], []
/usr/local/lib/python3.7/dist-packages/torch/nn/modules/loss.py in forward(self, input, target)
1119 def forward(self, input: Tensor, target: Tensor) -> Tensor:
1120 return F.cross_entropy(input, target, weight=self.weight,
-> 1121 ignore_index=self.ignore_index, reduction=self.reduction)
1122
1123
/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py in cross_entropy(input, target, weight, size_average, ignore_index, reduce, reduction)
2822 if size_average is not None or reduce is not None:
2823 reduction = _Reduction.legacy_get_string(size_average, reduce)
-> 2824 return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
2825
2826
ValueError: Expected input batch_size (1) to match target batch_size (11).
您不应该使用 .view(-1)
。这一行:
loss = criterion(output, target.view(-1))
应该是:
loss = criterion(output, target)
它正在有效地删除您的批次维度。对于 batch_size=1
,它将 (1, 11)
更改为 (11,)
。当您将 batch_size
更改为 11 时,视图将形状从 (11, 11)
更改为 (121,)
,因此出现错误。
我正在构建一个 RNN,它为输出中的 11 个维度生成多 class class 化输出。输入是我从预训练手套模型中提取的词嵌入。
我得到的错误是(问题末尾的完整回溯):
ValueError: Expected input batch_size (1) to match target batch_size (11).
请注意,我在这里使用 batch_size=1
,并且错误显示“预期批大小 1 以匹配目标 batch_size (11)”。但是,如果我将批量大小更改为 11,则错误更改为:
ValueError: Expected input batch_size (11) to match target batch_size (121).
我认为错误来自 text
的形状 torch.Size([11, 300])
,它缺少序列长度,但我认为如果我不分配序列长度,它默认为1. 但是不知道怎么加进去
训练循环:
def train(model, device, train_loader, valid_loader, epochs, learning_rate):
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
train_loss, validation_loss = [], []
train_acc, validation_acc = [], []
for epoch in range(epochs):
#train
model.train()
running_loss = 0.
correct, total = 0, 0
steps = 0
for idx, batch in enumerate(train_loader):
text = batch["Sample"].to(device)
target = batch['Class'].to(device)
print(text.shape, target.shape)
text, target = text.to(device), target.to(device)
# add micro for coding training loop
optimizer.zero_grad()
print(text.shape)
output, hidden = model(text.unsqueeze(1))
#print(output.shape, target.shape, target.view(-1).shape)
loss = criterion(output, target.view(-1))
loss.backward()
optimizer.step()
steps += 1
running_loss += loss.item()
# get accuracy
_, predicted = torch.max(output, 1)
print(predicted)
#predicted = torch.round(output.squeeze())
total += target.size(0)
correct += (predicted == target).sum().item()
train_loss.append(running_loss/len(train_loader))
train_acc.append(correct/total)
print(f'Epoch: {epoch + 1}, '
f'Training Loss: {running_loss/len(train_loader):.4f}, '
f'Training Accuracy: {100*correct/total: .2f}%')
# evaluate on validation data
model.eval()
running_loss = 0.
correct, total = 0, 0
with torch.no_grad():
for idx, batch in enumerate(valid_loader):
text = batch["Sample"].to(device)
print(type(text), text.shape)
target = batch['Class'].to(device)
target = torch.autograd.Variable(target).long()
text, target = text.to(device), target.to(device)
optimizer.zero_grad()
output = model(text)
loss = criterion(output, target)
running_loss += loss.item()
# get accuracy
_, predicted = torch.max(output, 1)
#predicted = torch.round(output.squeeze())
total += target.size(0)
correct += (predicted == target).sum().item()
validation_loss.append(running_loss/len(valid_loader))
validation_acc.append(correct/total)
print (f'Validation Loss: {running_loss/len(valid_loader):.4f}, '
f'Validation Accuracy: {100*correct/total: .2f}%')
return train_loss, train_acc, validation_loss, validation_acc
这就是我对训练循环的称呼:
# Model hyperparamters
#vocab_size = len(word_array)
learning_rate = 1e-3
hidden_dim = 100
output_size = 11
input_size = 300
epochs = 10
n_layers = 2
# Initialize model, training and testing
set_seed(SEED)
vanilla_rnn_model = VanillaRNN(input_size, output_size, hidden_dim, n_layers)
vanilla_rnn_model.to(DEVICE)
vanilla_rnn_start_time = time.time()
vanilla_train_loss, vanilla_train_acc, vanilla_validation_loss, vanilla_validation_acc = train(vanilla_rnn_model,
DEVICE,
train_loader,
valid_loader,
epochs = epochs,
learning_rate = learning_rate)
这就是我创建数据加载器的方式:
# Splitting dataset
# define a batch_size, I'll use 4 as an example
batch_size = 1
train_dset = CustomDataset(X2, y) # create data set
train_loader = DataLoader(train_dset, batch_size=batch_size, shuffle=True) #load data with batch size
valid_dset = CustomDataset(X2, y)
valid_loader = DataLoader(valid_dset, batch_size=batch_size, shuffle=True)
g_seed = torch.Generator()
g_seed.manual_seed(SEED)
完整追溯:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-23-bfd2f8f3456f> in <module>()
19 valid_loader,
20 epochs = epochs,
---> 21 learning_rate = learning_rate)
22 print("--- Time taken to train = %s seconds ---" % (time.time() - vanilla_rnn_start_time))
23 #test_accuracy = test(vanilla_rnn_model, DEVICE, test_iter)
3 frames
<ipython-input-22-16748701034f> in train(model, device, train_loader, valid_loader, epochs, learning_rate)
47 output, hidden = model(text.unsqueeze(1))
48 #print(output.shape, target.shape, target.view(-1).shape)
---> 49 loss = criterion(output, target.view(-1))
50 loss.backward()
51 optimizer.step()
/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
1049 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1050 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1051 return forward_call(*input, **kwargs)
1052 # Do not call functions when jit is used
1053 full_backward_hooks, non_full_backward_hooks = [], []
/usr/local/lib/python3.7/dist-packages/torch/nn/modules/loss.py in forward(self, input, target)
1119 def forward(self, input: Tensor, target: Tensor) -> Tensor:
1120 return F.cross_entropy(input, target, weight=self.weight,
-> 1121 ignore_index=self.ignore_index, reduction=self.reduction)
1122
1123
/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py in cross_entropy(input, target, weight, size_average, ignore_index, reduce, reduction)
2822 if size_average is not None or reduce is not None:
2823 reduction = _Reduction.legacy_get_string(size_average, reduce)
-> 2824 return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
2825
2826
ValueError: Expected input batch_size (1) to match target batch_size (11).
您不应该使用 .view(-1)
。这一行:
loss = criterion(output, target.view(-1))
应该是:
loss = criterion(output, target)
它正在有效地删除您的批次维度。对于 batch_size=1
,它将 (1, 11)
更改为 (11,)
。当您将 batch_size
更改为 11 时,视图将形状从 (11, 11)
更改为 (121,)
,因此出现错误。