如何从原始图像尺寸计算内核尺寸?
How to calculate kernel dimensions from original image dimensions?
https://github.com/kuangliu/pytorch-cifar/blob/master/models/resnet.py
根据阅读 https://www.cs.toronto.edu/~kriz/cifar.html,cifar 数据集由每张 32x32 维度的图像组成。
我对代码的理解:
self.conv1 = nn.Conv2d(3, 6, 5)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16*5*5, 120)
是:
self.conv1 = nn.Conv2d(3, 6, 5) # 3 channels in, 6 channels out , kernel size of 5
self.conv2 = nn.Conv2d(6, 16, 5) # 6 channels in, 16 channels out , kernel size of 5
self.fc1 = nn.Linear(16*5*5, 120) # 16*5*5 in features , 120 ouot feature
来自 resnet.py 以下 :
self.fc1 = nn.Linear(16*5*5, 120)
来自 http://cs231n.github.io/convolutional-networks/ 的内容如下:
Summary. To summarize, the Conv Layer:
Accepts a volume of size W1×H1×D1 Requires four hyperparameters:
Number of filters K, their spatial extent F, the stride S, the amount
of zero padding P. Produces a volume of size W2×H2×D2 where:
W2=(W1−F+2P)/S+1 H2=(H1−F+2P)/S+1 (i.e. width and height are computed
equally by symmetry) D2=K With parameter sharing, it introduces F⋅F⋅D1
weights per filter, for a total of (F⋅F⋅D1)⋅K weights and K biases. In
the output volume, the d-th depth slice (of size W2×H2) is the result
of performing a valid convolution of the d-th filter over the input
volume with a stride of S, and then offset by d-th bias.
据此我试图了解训练图像尺寸 32x32(1024 像素)如何转换为特征图(16*5*5 -> 400)作为 nn.Linear(16*5*5, 120)
从https://pytorch.org/docs/stable/nn.html#torch.nn.Conv2d可以看出默认stride为1,padding为0。
从 32*32 的图像尺寸得到 16*5*5 的步骤是什么?可以从上述步骤导出 16*5*5 吗?
从以上步骤如何计算spatial extent
?
更新:
源代码:
'''LeNet in PyTorch.'''
import torch.nn as nn
import torch.nn.functional as F
class LeNet(nn.Module):
def __init__(self):
super(LeNet, self).__init__()
self.conv1 = nn.Conv2d(3, 6, 5)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16*5*5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
def forward(self, x):
out = F.relu(self.conv1(x))
out = F.max_pool2d(out, 2)
out = F.relu(self.conv2(out))
out = F.max_pool2d(out, 2)
out = out.view(out.size(0), -1)
out = F.relu(self.fc1(out))
out = F.relu(self.fc2(out))
out = self.fc3(out)
return out
取自https://github.com/kuangliu/pytorch-cifar/blob/master/models/lenet.py
我的理解是卷积运算应用于每个内核的图像数据。因此,如果设置了 5 个内核,则对生成 5 维图像表示的数据应用 5 个卷积。
您在问题中没有提供足够的信息(参见 )。
但是,如果我不得不猜测,那么您的卷积层之间有两个池化层(步幅为 2):
- 输入大小 32x32(3 个通道)
conv1
输出大小 28x28(6 通道):conv 没有填充且内核大小为 5,将输入大小减少 4。
- 步幅为 2 的池化层,输出大小为 14x14(6 个通道)。
conv2
输出大小 10x10(16 个通道)
- 另一个步幅为 2 的池化层,输出大小为 5x5(16 个通道)
- 一个全连接层 (
nn.Linear
) 将所有 5x5x16 输入连接到所有 120 个输出。
可以找到更全面的感受野估计指南here。
https://github.com/kuangliu/pytorch-cifar/blob/master/models/resnet.py
根据阅读 https://www.cs.toronto.edu/~kriz/cifar.html,cifar 数据集由每张 32x32 维度的图像组成。
我对代码的理解:
self.conv1 = nn.Conv2d(3, 6, 5)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16*5*5, 120)
是:
self.conv1 = nn.Conv2d(3, 6, 5) # 3 channels in, 6 channels out , kernel size of 5
self.conv2 = nn.Conv2d(6, 16, 5) # 6 channels in, 16 channels out , kernel size of 5
self.fc1 = nn.Linear(16*5*5, 120) # 16*5*5 in features , 120 ouot feature
来自 resnet.py 以下 :
self.fc1 = nn.Linear(16*5*5, 120)
来自 http://cs231n.github.io/convolutional-networks/ 的内容如下:
Summary. To summarize, the Conv Layer:
Accepts a volume of size W1×H1×D1 Requires four hyperparameters: Number of filters K, their spatial extent F, the stride S, the amount of zero padding P. Produces a volume of size W2×H2×D2 where: W2=(W1−F+2P)/S+1 H2=(H1−F+2P)/S+1 (i.e. width and height are computed equally by symmetry) D2=K With parameter sharing, it introduces F⋅F⋅D1 weights per filter, for a total of (F⋅F⋅D1)⋅K weights and K biases. In the output volume, the d-th depth slice (of size W2×H2) is the result of performing a valid convolution of the d-th filter over the input volume with a stride of S, and then offset by d-th bias.
据此我试图了解训练图像尺寸 32x32(1024 像素)如何转换为特征图(16*5*5 -> 400)作为 nn.Linear(16*5*5, 120)
从https://pytorch.org/docs/stable/nn.html#torch.nn.Conv2d可以看出默认stride为1,padding为0。
从 32*32 的图像尺寸得到 16*5*5 的步骤是什么?可以从上述步骤导出 16*5*5 吗?
从以上步骤如何计算spatial extent
?
更新:
源代码:
'''LeNet in PyTorch.'''
import torch.nn as nn
import torch.nn.functional as F
class LeNet(nn.Module):
def __init__(self):
super(LeNet, self).__init__()
self.conv1 = nn.Conv2d(3, 6, 5)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16*5*5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
def forward(self, x):
out = F.relu(self.conv1(x))
out = F.max_pool2d(out, 2)
out = F.relu(self.conv2(out))
out = F.max_pool2d(out, 2)
out = out.view(out.size(0), -1)
out = F.relu(self.fc1(out))
out = F.relu(self.fc2(out))
out = self.fc3(out)
return out
取自https://github.com/kuangliu/pytorch-cifar/blob/master/models/lenet.py
我的理解是卷积运算应用于每个内核的图像数据。因此,如果设置了 5 个内核,则对生成 5 维图像表示的数据应用 5 个卷积。
您在问题中没有提供足够的信息(参见
但是,如果我不得不猜测,那么您的卷积层之间有两个池化层(步幅为 2):
- 输入大小 32x32(3 个通道)
conv1
输出大小 28x28(6 通道):conv 没有填充且内核大小为 5,将输入大小减少 4。- 步幅为 2 的池化层,输出大小为 14x14(6 个通道)。
conv2
输出大小 10x10(16 个通道)- 另一个步幅为 2 的池化层,输出大小为 5x5(16 个通道)
- 一个全连接层 (
nn.Linear
) 将所有 5x5x16 输入连接到所有 120 个输出。
可以找到更全面的感受野估计指南here。