为什么要重塑 VGG_UNet 分割模型的最后几层?

Why it is reshaped the last layers of VGG_UNet segmentation model?

我想使用深度学习(在 python 中)解决多 class 分割任务。在这里,是主要从 GitHub 收集的 vgg_unet 模型的摘要。因此,在我的数据集中,有 8 个标签可用。因此,在最后一个卷积层,有 8 个通道用于每个 class 的分类 class 化。 我的模型总结如下,

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_1 (InputLayer)            [(None, 512, 512, 3) 0                                            
__________________________________________________________________________________________________
block1_conv1 (Conv2D)           (None, 512, 512, 64) 1792        input_1[0][0]                    
__________________________________________________________________________________________________
block1_conv2 (Conv2D)           (None, 512, 512, 64) 36928       block1_conv1[0][0]               
__________________________________________________________________________________________________
block1_pool (MaxPooling2D)      (None, 256, 256, 64) 0           block1_conv2[0][0]               
__________________________________________________________________________________________________
block2_conv1 (Conv2D)           (None, 256, 256, 128 73856       block1_pool[0][0]                
__________________________________________________________________________________________________
block2_conv2 (Conv2D)           (None, 256, 256, 128 147584      block2_conv1[0][0]               
__________________________________________________________________________________________________
block2_pool (MaxPooling2D)      (None, 128, 128, 128 0           block2_conv2[0][0]               
__________________________________________________________________________________________________
block3_conv1 (Conv2D)           (None, 128, 128, 256 295168      block2_pool[0][0]                
__________________________________________________________________________________________________
block3_conv2 (Conv2D)           (None, 128, 128, 256 590080      block3_conv1[0][0]               
__________________________________________________________________________________________________
block3_conv3 (Conv2D)           (None, 128, 128, 256 590080      block3_conv2[0][0]               
__________________________________________________________________________________________________
block3_pool (MaxPooling2D)      (None, 64, 64, 256)  0           block3_conv3[0][0]               
__________________________________________________________________________________________________
block4_conv1 (Conv2D)           (None, 64, 64, 512)  1180160     block3_pool[0][0]                
__________________________________________________________________________________________________
block4_conv2 (Conv2D)           (None, 64, 64, 512)  2359808     block4_conv1[0][0]               
__________________________________________________________________________________________________
block4_conv3 (Conv2D)           (None, 64, 64, 512)  2359808     block4_conv2[0][0]               
__________________________________________________________________________________________________
block4_pool (MaxPooling2D)      (None, 32, 32, 512)  0           block4_conv3[0][0]               
__________________________________________________________________________________________________
zero_padding2d (ZeroPadding2D)  (None, 34, 34, 512)  0           block4_pool[0][0]                
__________________________________________________________________________________________________
conv2d (Conv2D)                 (None, 32, 32, 512)  2359808     zero_padding2d[0][0]             
__________________________________________________________________________________________________
batch_normalization (BatchNorma (None, 32, 32, 512)  2048        conv2d[0][0]                     
__________________________________________________________________________________________________
up_sampling2d (UpSampling2D)    (None, 64, 64, 512)  0           batch_normalization[0][0]        
__________________________________________________________________________________________________
concatenate (Concatenate)       (None, 64, 64, 768)  0           up_sampling2d[0][0]              
                                                                 block3_pool[0][0]                
__________________________________________________________________________________________________
zero_padding2d_1 (ZeroPadding2D (None, 66, 66, 768)  0           concatenate[0][0]                
__________________________________________________________________________________________________
conv2d_1 (Conv2D)               (None, 64, 64, 256)  1769728     zero_padding2d_1[0][0]           
__________________________________________________________________________________________________
batch_normalization_1 (BatchNor (None, 64, 64, 256)  1024        conv2d_1[0][0]                   
__________________________________________________________________________________________________
up_sampling2d_1 (UpSampling2D)  (None, 128, 128, 256 0           batch_normalization_1[0][0]      
__________________________________________________________________________________________________
concatenate_1 (Concatenate)     (None, 128, 128, 384 0           up_sampling2d_1[0][0]            
                                                                 block2_pool[0][0]                
__________________________________________________________________________________________________
zero_padding2d_2 (ZeroPadding2D (None, 130, 130, 384 0           concatenate_1[0][0]              
__________________________________________________________________________________________________
conv2d_2 (Conv2D)               (None, 128, 128, 128 442496      zero_padding2d_2[0][0]           
__________________________________________________________________________________________________
batch_normalization_2 (BatchNor (None, 128, 128, 128 512         conv2d_2[0][0]                   
__________________________________________________________________________________________________
up_sampling2d_2 (UpSampling2D)  (None, 256, 256, 128 0           batch_normalization_2[0][0]      
__________________________________________________________________________________________________
concatenate_2 (Concatenate)     (None, 256, 256, 192 0           up_sampling2d_2[0][0]            
                                                                 block1_pool[0][0]                
__________________________________________________________________________________________________
zero_padding2d_3 (ZeroPadding2D (None, 258, 258, 192 0           concatenate_2[0][0]              
__________________________________________________________________________________________________
conv2d_3 (Conv2D)               (None, 256, 256, 64) 110656      zero_padding2d_3[0][0]           
__________________________________________________________________________________________________
batch_normalization_3 (BatchNor (None, 256, 256, 64) 256         conv2d_3[0][0]                   
__________________________________________________________________________________________________
up_sampling2d_3 (UpSampling2D)  (None, 512, 512, 64) 0           batch_normalization_3[0][0]      
__________________________________________________________________________________________________
conv2d_4 (Conv2D)               (None, 512, 512, 64) 36928       up_sampling2d_3[0][0]            
__________________________________________________________________________________________________
batch_normalization_4 (BatchNor (None, 512, 512, 64) 256         conv2d_4[0][0]                   
__________________________________________________________________________________________________
conv2d_5 (Conv2D)               (None, 512, 512, 8)  4616        batch_normalization_4[0][0]      
__________________________________________________________________________________________________
activation (Activation)         (None, 512, 512, 8)  0           conv2d_5[0][0]                   
==================================================================================================
Total params: 12,363,592
Trainable params: 12,361,544
Non-trainable params: 2,048
__________________________________________________________________________________________________

但是,在主 GitHub 页面中,他将 conv2d_5 层(我模型中的最后一个卷积层)的输出重塑为单一维度,如下所示。

__________________________________________________________________________________________________
conv2d_5 (Conv2D)               (None, 512, 512, 8)  4616        batch_normalization_4[0][0]      
__________________________________________________________________________________________________
reshape (Reshape)               (None, 262144, None) 0           conv2d_5[0][0]                   
__________________________________________________________________________________________________
activation (Activation)         (None, 262144, None) 0           reshape[0][0]                    
==================================================================================================  

我的问题是为什么要在这里使用这种类型的重塑,它的目的和好处是什么?此外,当我预测和可视化任何图像时,我需要将其重塑为 (512,512,8) 并进一步处理它。那么,这种类型的重塑(上面总结中的重塑层)有什么好处,如果我不使用这种重塑,我的模型会有什么缺点?

不整形就没有问题;事实上,reshape 操作是不必要的,在这种情况下它是一个多余的操作。

当我开始深入研究图像分割时,我也质疑过自己。有些存储库省略了这一步(其中大部分),其中一些存储库重塑然后才添加 sigmoid/softmax 激活。

根据我的经验,我没有看到任何 advantage/better results/strong 应该实施重塑的数学原因。因此,如果您在代码中省略它,我看不出有任何问题。