使用CrossEntropyLoss函数时,目标batch size不匹配的错误如何解决?
How to fix the error where the target batch size does not match when I use CrossEntropyLoss function?
我正在使用 CNN 进行训练任务。当我使用 CrossEntropyLoss 创建损失函数并训练数据集时,错误提醒我批大小不匹配。
这是训练的主要代码:
net = SimpleConvolutionalNetwork()
train_history, val_history = train(net, batch_size=32, n_epochs=10, learning_rate=0.001)
plot_losses(train_history, val_history)
这是神经元网络代码:
class SimpleConvolutionalNetwork(nn.Module):
# Q: why the scope of input not changed after relu??
def __init__(self) -> None:
super(SimpleConvolutionalNetwork, self).__init__()
# define convolutional filting layer(3 grids) and output size(18 channels)
self.conv1 = nn.Conv2d(3, 18, kernel_size=3, stride=1, padding=1)
# define pooling layer with max-pooling function
self.pool = nn.MaxPool2d(kernel_size=2, stride=2, padding=0)
# define FCL and output layer by Linear function
self.fc1 = nn.Linear(18*16*16, 64)
self.fc2 = nn.Linear(64, 10)
# Q: where the pooling layer??
def forward(self, x):
# input shape: 3(grids) * 32 * 32(32*32 is the scope of each grid)
# filted by conv1 defined in the construction function
# then relu the filted x
x = F.relu(self.conv1(x))
# now let 18*32*32 -> 18*16*16
x = x.view(-1, 18*16*16)
# two step for 18*16*16(totally 4608) -> 64
# output by FC firstly, then relu again the output
x = F.relu(self.fc1(x))
# 64 -> 10 finally
x = self.fc2(x)
return x
train函数中,错误的地方在损失函数的构造上。由于上下文很长,主要部分如下所示:
def train(net, batch_size, n_epochs, learning_rate):
...
# load the training dataset
train_loader = get_train_loader(batch_size)
# get validation dataset
val_loader = get_val_loader(batch_size)
# set batch size
n_minibatches = len(train_loader)
# set loss function and validation test checking
criterion, optimizer = createLossAndOptimizer(net, learning_rate)
train_history = []
val_history = []
training_start_time = time.time()
best_error = np.inf
best_model_path = "best_model_path"
# GPU if possible
net = net.to(device)
for epoch in range(n_epochs):
running_loss = 0.0
print_every = n_minibatches
start_time = time.time()
total_train_loss = 0.0
# step1: training the datasets
for i, (inputs, labels) in enumerate(train_loader):
inputs, labels = inputs.to(device), labels.to(device)
optimizer.zero_grad()
# forward + backward + optimize
outputs = net(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
#print statistics
running_loss += loss.item()
total_train_loss += loss.item()
# print every 10th of epoch
if (i + 1) % (print_every + 1) == 0:
print("Epoch {}, {:d}% \t train_loss: {:.2f} took: {:.2f}s".format(
epoch + 1, int(100 * (i + 1) / n_minibatches), running_loss / print_every,
time.time() - start_time))
running_loss = 0.0
start_time = time.time()
train_history.append(total_train_loss / len(train_loader))
...
损失构造函数和数据集加载是这样的:
def createLossAndOptimizer(net, learning_rate=0.001):
# define a cross-entropy loss function:
criterion = nn.CrossEntropyLoss()
# optimizer include three parameters: net, learning rate, and
# momentum rate for validate the dataset from over-fitting(default
# value is 0.9)
optimizer = opt.Adam(net.parameters(), lr=learning_rate)
return criterion, optimizer
def get_train_loader(batch_size):
return th.utils.data.DataLoader(train_set,batch_size=batch_size,sampler=train_sampler, num_workers=num_workers)
def get_val_loader(batch_size):
return th.utils.data.DataLoader(train_set,batch_size=batch_size,sampler=train_sampler, num_workers=num_workers)
但是报错提醒我输入的batch size大于target batch size:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-19-07b692e7a2bb> in <module>()
173 net = SimpleConvolutionalNetwork()
174
--> 175 train_history, val_history = train(net, batch_size=32, n_epochs=10, learning_rate=0.001)
176
177 plot_losses(train_history, val_history)
3 frames
/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py in cross_entropy(input, target, weight, size_average, ignore_index, reduce, reduction, label_smoothing)
2844 if size_average is not None or reduce is not None:
2845 reduction = _Reduction.legacy_get_string(size_average, reduce)
-> 2846 return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)
2847
2848
ValueError: Expected input batch_size (128) to match target batch_size (32).
我主要是因为 'labels' 是 4 号,所以我错误地设置了错误的参数。但我不知道如何解决它。感谢您的回答。
在SimpleConvolutionalNetwork
的forward
方法中应用conv1
后,张量x
的形状为(batch_size, 18, 32, 32)
。因此,当 x
的 x = x.view(-1, 18 * 16 * 16)
形状变为 (batch_size * 4, 18 * 16 * 16)
并且由于进一步应用 fully-connected 层不会更改此新批量大小,输出的形状为 (batch_size * 4, 10)
。我的建议是在卷积后立即使用池化,例如:
x = F.relu(self.conv1(x)) # after that x will have shape (batch_size, 18, 32, 32)
x = self.pool(x) # after that x will have shape (batch_size, 18, 16, 16)
往前走将 return 张量 (batch_size, 10)
并且批量大小不匹配错误将不会发生。
我正在使用 CNN 进行训练任务。当我使用 CrossEntropyLoss 创建损失函数并训练数据集时,错误提醒我批大小不匹配。 这是训练的主要代码:
net = SimpleConvolutionalNetwork()
train_history, val_history = train(net, batch_size=32, n_epochs=10, learning_rate=0.001)
plot_losses(train_history, val_history)
这是神经元网络代码:
class SimpleConvolutionalNetwork(nn.Module):
# Q: why the scope of input not changed after relu??
def __init__(self) -> None:
super(SimpleConvolutionalNetwork, self).__init__()
# define convolutional filting layer(3 grids) and output size(18 channels)
self.conv1 = nn.Conv2d(3, 18, kernel_size=3, stride=1, padding=1)
# define pooling layer with max-pooling function
self.pool = nn.MaxPool2d(kernel_size=2, stride=2, padding=0)
# define FCL and output layer by Linear function
self.fc1 = nn.Linear(18*16*16, 64)
self.fc2 = nn.Linear(64, 10)
# Q: where the pooling layer??
def forward(self, x):
# input shape: 3(grids) * 32 * 32(32*32 is the scope of each grid)
# filted by conv1 defined in the construction function
# then relu the filted x
x = F.relu(self.conv1(x))
# now let 18*32*32 -> 18*16*16
x = x.view(-1, 18*16*16)
# two step for 18*16*16(totally 4608) -> 64
# output by FC firstly, then relu again the output
x = F.relu(self.fc1(x))
# 64 -> 10 finally
x = self.fc2(x)
return x
train函数中,错误的地方在损失函数的构造上。由于上下文很长,主要部分如下所示:
def train(net, batch_size, n_epochs, learning_rate):
...
# load the training dataset
train_loader = get_train_loader(batch_size)
# get validation dataset
val_loader = get_val_loader(batch_size)
# set batch size
n_minibatches = len(train_loader)
# set loss function and validation test checking
criterion, optimizer = createLossAndOptimizer(net, learning_rate)
train_history = []
val_history = []
training_start_time = time.time()
best_error = np.inf
best_model_path = "best_model_path"
# GPU if possible
net = net.to(device)
for epoch in range(n_epochs):
running_loss = 0.0
print_every = n_minibatches
start_time = time.time()
total_train_loss = 0.0
# step1: training the datasets
for i, (inputs, labels) in enumerate(train_loader):
inputs, labels = inputs.to(device), labels.to(device)
optimizer.zero_grad()
# forward + backward + optimize
outputs = net(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
#print statistics
running_loss += loss.item()
total_train_loss += loss.item()
# print every 10th of epoch
if (i + 1) % (print_every + 1) == 0:
print("Epoch {}, {:d}% \t train_loss: {:.2f} took: {:.2f}s".format(
epoch + 1, int(100 * (i + 1) / n_minibatches), running_loss / print_every,
time.time() - start_time))
running_loss = 0.0
start_time = time.time()
train_history.append(total_train_loss / len(train_loader))
...
损失构造函数和数据集加载是这样的:
def createLossAndOptimizer(net, learning_rate=0.001):
# define a cross-entropy loss function:
criterion = nn.CrossEntropyLoss()
# optimizer include three parameters: net, learning rate, and
# momentum rate for validate the dataset from over-fitting(default
# value is 0.9)
optimizer = opt.Adam(net.parameters(), lr=learning_rate)
return criterion, optimizer
def get_train_loader(batch_size):
return th.utils.data.DataLoader(train_set,batch_size=batch_size,sampler=train_sampler, num_workers=num_workers)
def get_val_loader(batch_size):
return th.utils.data.DataLoader(train_set,batch_size=batch_size,sampler=train_sampler, num_workers=num_workers)
但是报错提醒我输入的batch size大于target batch size:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-19-07b692e7a2bb> in <module>()
173 net = SimpleConvolutionalNetwork()
174
--> 175 train_history, val_history = train(net, batch_size=32, n_epochs=10, learning_rate=0.001)
176
177 plot_losses(train_history, val_history)
3 frames
/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py in cross_entropy(input, target, weight, size_average, ignore_index, reduce, reduction, label_smoothing)
2844 if size_average is not None or reduce is not None:
2845 reduction = _Reduction.legacy_get_string(size_average, reduce)
-> 2846 return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)
2847
2848
ValueError: Expected input batch_size (128) to match target batch_size (32).
我主要是因为 'labels' 是 4 号,所以我错误地设置了错误的参数。但我不知道如何解决它。感谢您的回答。
在SimpleConvolutionalNetwork
的forward
方法中应用conv1
后,张量x
的形状为(batch_size, 18, 32, 32)
。因此,当 x
的 x = x.view(-1, 18 * 16 * 16)
形状变为 (batch_size * 4, 18 * 16 * 16)
并且由于进一步应用 fully-connected 层不会更改此新批量大小,输出的形状为 (batch_size * 4, 10)
。我的建议是在卷积后立即使用池化,例如:
x = F.relu(self.conv1(x)) # after that x will have shape (batch_size, 18, 32, 32)
x = self.pool(x) # after that x will have shape (batch_size, 18, 16, 16)
往前走将 return 张量 (batch_size, 10)
并且批量大小不匹配错误将不会发生。