为什么要重塑 VGG_UNet 分割模型的最后几层?
Why it is reshaped the last layers of VGG_UNet segmentation model?
我想使用深度学习(在 python 中)解决多 class 分割任务。在这里,是主要从 GitHub 收集的 vgg_unet 模型的摘要。因此,在我的数据集中,有 8 个标签可用。因此,在最后一个卷积层,有 8 个通道用于每个 class 的分类 class 化。
我的模型总结如下,
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) [(None, 512, 512, 3) 0
__________________________________________________________________________________________________
block1_conv1 (Conv2D) (None, 512, 512, 64) 1792 input_1[0][0]
__________________________________________________________________________________________________
block1_conv2 (Conv2D) (None, 512, 512, 64) 36928 block1_conv1[0][0]
__________________________________________________________________________________________________
block1_pool (MaxPooling2D) (None, 256, 256, 64) 0 block1_conv2[0][0]
__________________________________________________________________________________________________
block2_conv1 (Conv2D) (None, 256, 256, 128 73856 block1_pool[0][0]
__________________________________________________________________________________________________
block2_conv2 (Conv2D) (None, 256, 256, 128 147584 block2_conv1[0][0]
__________________________________________________________________________________________________
block2_pool (MaxPooling2D) (None, 128, 128, 128 0 block2_conv2[0][0]
__________________________________________________________________________________________________
block3_conv1 (Conv2D) (None, 128, 128, 256 295168 block2_pool[0][0]
__________________________________________________________________________________________________
block3_conv2 (Conv2D) (None, 128, 128, 256 590080 block3_conv1[0][0]
__________________________________________________________________________________________________
block3_conv3 (Conv2D) (None, 128, 128, 256 590080 block3_conv2[0][0]
__________________________________________________________________________________________________
block3_pool (MaxPooling2D) (None, 64, 64, 256) 0 block3_conv3[0][0]
__________________________________________________________________________________________________
block4_conv1 (Conv2D) (None, 64, 64, 512) 1180160 block3_pool[0][0]
__________________________________________________________________________________________________
block4_conv2 (Conv2D) (None, 64, 64, 512) 2359808 block4_conv1[0][0]
__________________________________________________________________________________________________
block4_conv3 (Conv2D) (None, 64, 64, 512) 2359808 block4_conv2[0][0]
__________________________________________________________________________________________________
block4_pool (MaxPooling2D) (None, 32, 32, 512) 0 block4_conv3[0][0]
__________________________________________________________________________________________________
zero_padding2d (ZeroPadding2D) (None, 34, 34, 512) 0 block4_pool[0][0]
__________________________________________________________________________________________________
conv2d (Conv2D) (None, 32, 32, 512) 2359808 zero_padding2d[0][0]
__________________________________________________________________________________________________
batch_normalization (BatchNorma (None, 32, 32, 512) 2048 conv2d[0][0]
__________________________________________________________________________________________________
up_sampling2d (UpSampling2D) (None, 64, 64, 512) 0 batch_normalization[0][0]
__________________________________________________________________________________________________
concatenate (Concatenate) (None, 64, 64, 768) 0 up_sampling2d[0][0]
block3_pool[0][0]
__________________________________________________________________________________________________
zero_padding2d_1 (ZeroPadding2D (None, 66, 66, 768) 0 concatenate[0][0]
__________________________________________________________________________________________________
conv2d_1 (Conv2D) (None, 64, 64, 256) 1769728 zero_padding2d_1[0][0]
__________________________________________________________________________________________________
batch_normalization_1 (BatchNor (None, 64, 64, 256) 1024 conv2d_1[0][0]
__________________________________________________________________________________________________
up_sampling2d_1 (UpSampling2D) (None, 128, 128, 256 0 batch_normalization_1[0][0]
__________________________________________________________________________________________________
concatenate_1 (Concatenate) (None, 128, 128, 384 0 up_sampling2d_1[0][0]
block2_pool[0][0]
__________________________________________________________________________________________________
zero_padding2d_2 (ZeroPadding2D (None, 130, 130, 384 0 concatenate_1[0][0]
__________________________________________________________________________________________________
conv2d_2 (Conv2D) (None, 128, 128, 128 442496 zero_padding2d_2[0][0]
__________________________________________________________________________________________________
batch_normalization_2 (BatchNor (None, 128, 128, 128 512 conv2d_2[0][0]
__________________________________________________________________________________________________
up_sampling2d_2 (UpSampling2D) (None, 256, 256, 128 0 batch_normalization_2[0][0]
__________________________________________________________________________________________________
concatenate_2 (Concatenate) (None, 256, 256, 192 0 up_sampling2d_2[0][0]
block1_pool[0][0]
__________________________________________________________________________________________________
zero_padding2d_3 (ZeroPadding2D (None, 258, 258, 192 0 concatenate_2[0][0]
__________________________________________________________________________________________________
conv2d_3 (Conv2D) (None, 256, 256, 64) 110656 zero_padding2d_3[0][0]
__________________________________________________________________________________________________
batch_normalization_3 (BatchNor (None, 256, 256, 64) 256 conv2d_3[0][0]
__________________________________________________________________________________________________
up_sampling2d_3 (UpSampling2D) (None, 512, 512, 64) 0 batch_normalization_3[0][0]
__________________________________________________________________________________________________
conv2d_4 (Conv2D) (None, 512, 512, 64) 36928 up_sampling2d_3[0][0]
__________________________________________________________________________________________________
batch_normalization_4 (BatchNor (None, 512, 512, 64) 256 conv2d_4[0][0]
__________________________________________________________________________________________________
conv2d_5 (Conv2D) (None, 512, 512, 8) 4616 batch_normalization_4[0][0]
__________________________________________________________________________________________________
activation (Activation) (None, 512, 512, 8) 0 conv2d_5[0][0]
==================================================================================================
Total params: 12,363,592
Trainable params: 12,361,544
Non-trainable params: 2,048
__________________________________________________________________________________________________
但是,在主 GitHub 页面中,他将 conv2d_5 层(我模型中的最后一个卷积层)的输出重塑为单一维度,如下所示。
__________________________________________________________________________________________________
conv2d_5 (Conv2D) (None, 512, 512, 8) 4616 batch_normalization_4[0][0]
__________________________________________________________________________________________________
reshape (Reshape) (None, 262144, None) 0 conv2d_5[0][0]
__________________________________________________________________________________________________
activation (Activation) (None, 262144, None) 0 reshape[0][0]
==================================================================================================
我的问题是为什么要在这里使用这种类型的重塑,它的目的和好处是什么?此外,当我预测和可视化任何图像时,我需要将其重塑为 (512,512,8) 并进一步处理它。那么,这种类型的重塑(上面总结中的重塑层)有什么好处,如果我不使用这种重塑,我的模型会有什么缺点?
不整形就没有问题;事实上,reshape 操作是不必要的,在这种情况下它是一个多余的操作。
当我开始深入研究图像分割时,我也质疑过自己。有些存储库省略了这一步(其中大部分),其中一些存储库重塑然后才添加 sigmoid/softmax 激活。
根据我的经验,我没有看到任何 advantage/better results/strong 应该实施重塑的数学原因。因此,如果您在代码中省略它,我看不出有任何问题。
我想使用深度学习(在 python 中)解决多 class 分割任务。在这里,是主要从 GitHub 收集的 vgg_unet 模型的摘要。因此,在我的数据集中,有 8 个标签可用。因此,在最后一个卷积层,有 8 个通道用于每个 class 的分类 class 化。 我的模型总结如下,
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) [(None, 512, 512, 3) 0
__________________________________________________________________________________________________
block1_conv1 (Conv2D) (None, 512, 512, 64) 1792 input_1[0][0]
__________________________________________________________________________________________________
block1_conv2 (Conv2D) (None, 512, 512, 64) 36928 block1_conv1[0][0]
__________________________________________________________________________________________________
block1_pool (MaxPooling2D) (None, 256, 256, 64) 0 block1_conv2[0][0]
__________________________________________________________________________________________________
block2_conv1 (Conv2D) (None, 256, 256, 128 73856 block1_pool[0][0]
__________________________________________________________________________________________________
block2_conv2 (Conv2D) (None, 256, 256, 128 147584 block2_conv1[0][0]
__________________________________________________________________________________________________
block2_pool (MaxPooling2D) (None, 128, 128, 128 0 block2_conv2[0][0]
__________________________________________________________________________________________________
block3_conv1 (Conv2D) (None, 128, 128, 256 295168 block2_pool[0][0]
__________________________________________________________________________________________________
block3_conv2 (Conv2D) (None, 128, 128, 256 590080 block3_conv1[0][0]
__________________________________________________________________________________________________
block3_conv3 (Conv2D) (None, 128, 128, 256 590080 block3_conv2[0][0]
__________________________________________________________________________________________________
block3_pool (MaxPooling2D) (None, 64, 64, 256) 0 block3_conv3[0][0]
__________________________________________________________________________________________________
block4_conv1 (Conv2D) (None, 64, 64, 512) 1180160 block3_pool[0][0]
__________________________________________________________________________________________________
block4_conv2 (Conv2D) (None, 64, 64, 512) 2359808 block4_conv1[0][0]
__________________________________________________________________________________________________
block4_conv3 (Conv2D) (None, 64, 64, 512) 2359808 block4_conv2[0][0]
__________________________________________________________________________________________________
block4_pool (MaxPooling2D) (None, 32, 32, 512) 0 block4_conv3[0][0]
__________________________________________________________________________________________________
zero_padding2d (ZeroPadding2D) (None, 34, 34, 512) 0 block4_pool[0][0]
__________________________________________________________________________________________________
conv2d (Conv2D) (None, 32, 32, 512) 2359808 zero_padding2d[0][0]
__________________________________________________________________________________________________
batch_normalization (BatchNorma (None, 32, 32, 512) 2048 conv2d[0][0]
__________________________________________________________________________________________________
up_sampling2d (UpSampling2D) (None, 64, 64, 512) 0 batch_normalization[0][0]
__________________________________________________________________________________________________
concatenate (Concatenate) (None, 64, 64, 768) 0 up_sampling2d[0][0]
block3_pool[0][0]
__________________________________________________________________________________________________
zero_padding2d_1 (ZeroPadding2D (None, 66, 66, 768) 0 concatenate[0][0]
__________________________________________________________________________________________________
conv2d_1 (Conv2D) (None, 64, 64, 256) 1769728 zero_padding2d_1[0][0]
__________________________________________________________________________________________________
batch_normalization_1 (BatchNor (None, 64, 64, 256) 1024 conv2d_1[0][0]
__________________________________________________________________________________________________
up_sampling2d_1 (UpSampling2D) (None, 128, 128, 256 0 batch_normalization_1[0][0]
__________________________________________________________________________________________________
concatenate_1 (Concatenate) (None, 128, 128, 384 0 up_sampling2d_1[0][0]
block2_pool[0][0]
__________________________________________________________________________________________________
zero_padding2d_2 (ZeroPadding2D (None, 130, 130, 384 0 concatenate_1[0][0]
__________________________________________________________________________________________________
conv2d_2 (Conv2D) (None, 128, 128, 128 442496 zero_padding2d_2[0][0]
__________________________________________________________________________________________________
batch_normalization_2 (BatchNor (None, 128, 128, 128 512 conv2d_2[0][0]
__________________________________________________________________________________________________
up_sampling2d_2 (UpSampling2D) (None, 256, 256, 128 0 batch_normalization_2[0][0]
__________________________________________________________________________________________________
concatenate_2 (Concatenate) (None, 256, 256, 192 0 up_sampling2d_2[0][0]
block1_pool[0][0]
__________________________________________________________________________________________________
zero_padding2d_3 (ZeroPadding2D (None, 258, 258, 192 0 concatenate_2[0][0]
__________________________________________________________________________________________________
conv2d_3 (Conv2D) (None, 256, 256, 64) 110656 zero_padding2d_3[0][0]
__________________________________________________________________________________________________
batch_normalization_3 (BatchNor (None, 256, 256, 64) 256 conv2d_3[0][0]
__________________________________________________________________________________________________
up_sampling2d_3 (UpSampling2D) (None, 512, 512, 64) 0 batch_normalization_3[0][0]
__________________________________________________________________________________________________
conv2d_4 (Conv2D) (None, 512, 512, 64) 36928 up_sampling2d_3[0][0]
__________________________________________________________________________________________________
batch_normalization_4 (BatchNor (None, 512, 512, 64) 256 conv2d_4[0][0]
__________________________________________________________________________________________________
conv2d_5 (Conv2D) (None, 512, 512, 8) 4616 batch_normalization_4[0][0]
__________________________________________________________________________________________________
activation (Activation) (None, 512, 512, 8) 0 conv2d_5[0][0]
==================================================================================================
Total params: 12,363,592
Trainable params: 12,361,544
Non-trainable params: 2,048
__________________________________________________________________________________________________
但是,在主 GitHub 页面中,他将 conv2d_5 层(我模型中的最后一个卷积层)的输出重塑为单一维度,如下所示。
__________________________________________________________________________________________________
conv2d_5 (Conv2D) (None, 512, 512, 8) 4616 batch_normalization_4[0][0]
__________________________________________________________________________________________________
reshape (Reshape) (None, 262144, None) 0 conv2d_5[0][0]
__________________________________________________________________________________________________
activation (Activation) (None, 262144, None) 0 reshape[0][0]
==================================================================================================
我的问题是为什么要在这里使用这种类型的重塑,它的目的和好处是什么?此外,当我预测和可视化任何图像时,我需要将其重塑为 (512,512,8) 并进一步处理它。那么,这种类型的重塑(上面总结中的重塑层)有什么好处,如果我不使用这种重塑,我的模型会有什么缺点?
不整形就没有问题;事实上,reshape 操作是不必要的,在这种情况下它是一个多余的操作。
当我开始深入研究图像分割时,我也质疑过自己。有些存储库省略了这一步(其中大部分),其中一些存储库重塑然后才添加 sigmoid/softmax 激活。
根据我的经验,我没有看到任何 advantage/better results/strong 应该实施重塑的数学原因。因此,如果您在代码中省略它,我看不出有任何问题。