我可以将 VGG16 用于单通道图像吗?

Can I use VGG16 for one channel images?

我刚刚开始学习 Tensorflow (2.1.0)、Keras (2.3.7) Python 3.7.7。

我想使用 VGG16 网络对黑白图像 (200x200x1) 进行语义分割。

我用过这个网络,原来input_size(224,224,3):

def vgg16_encoder_decoder(input_size = (200,200,1)):
    #################################
    # Encoder
    #################################
    inputs = Input(input_size, name = 'input')

    conv1 = Conv2D(64, (3, 3), activation = 'relu', padding = 'same', name ='conv1_1')(inputs)
    conv1 = Conv2D(64, (3, 3), activation = 'relu', padding = 'same', name ='conv1_2')(conv1)
    pool1 = MaxPooling2D(pool_size = (2,2), strides = (2,2), name = 'pool_1')(conv1)

    conv2 = Conv2D(128, (3, 3), activation = 'relu', padding = 'same', name ='conv2_1')(pool1)
    conv2 = Conv2D(128, (3, 3), activation = 'relu', padding = 'same', name ='conv2_2')(conv2)
    pool2 = MaxPooling2D(pool_size = (2,2), strides = (2,2), name = 'pool_2')(conv2)

    conv3 = Conv2D(256, (3, 3), activation = 'relu', padding = 'same', name ='conv3_1')(pool2)
    conv3 = Conv2D(256, (3, 3), activation = 'relu', padding = 'same', name ='conv3_2')(conv3)
    conv3 = Conv2D(256, (3, 3), activation = 'relu', padding = 'same', name ='conv3_3')(conv3)
    pool3 = MaxPooling2D(pool_size = (2,2), strides = (2,2), name = 'pool_3')(conv3)

    conv4 = Conv2D(512, (3, 3), activation = 'relu', padding = 'same', name ='conv4_1')(pool3)
    conv4 = Conv2D(512, (3, 3), activation = 'relu', padding = 'same', name ='conv4_2')(conv4)
    conv4 = Conv2D(512, (3, 3), activation = 'relu', padding = 'same', name ='conv4_3')(conv4)
    pool4 = MaxPooling2D(pool_size = (2,2), strides = (2,2), name = 'pool_4')(conv4)

    conv5 = Conv2D(512, (3, 3), activation = 'relu', padding = 'same', name ='conv5_1')(pool4)
    conv5 = Conv2D(512, (3, 3), activation = 'relu', padding = 'same', name ='conv5_2')(conv5)
    conv5 = Conv2D(512, (3, 3), activation = 'relu', padding = 'same', name ='conv5_3')(conv5)
    pool5 = MaxPooling2D(pool_size = (2,2), strides = (2,2), name = 'pool_5')(conv5)

    #################################
    # Decoder
    #################################
    #conv1 = Conv2DTranspose(512, (2, 2), strides = 2, name = 'conv1')(pool5)

    upsp1 = UpSampling2D(size = (2,2), name = 'upsp1')(pool5)
    conv6 = Conv2D(512, 3, activation = 'relu', padding = 'same', name = 'conv6_1')(upsp1)
    conv6 = Conv2D(512, 3, activation = 'relu', padding = 'same', name = 'conv6_2')(conv6)
    conv6 = Conv2D(512, 3, activation = 'relu', padding = 'same', name = 'conv6_3')(conv6)

    upsp2 = UpSampling2D(size = (2,2), name = 'upsp2')(conv6)
    conv7 = Conv2D(512, 3, activation = 'relu', padding = 'same', name = 'conv7_1')(upsp2)
    conv7 = Conv2D(512, 3, activation = 'relu', padding = 'same', name = 'conv7_2')(conv7)
    conv7 = Conv2D(512, 3, activation = 'relu', padding = 'same', name = 'conv7_3')(conv7)

    upsp3 = UpSampling2D(size = (2,2), name = 'upsp3')(conv7)
    conv8 = Conv2D(256, 3, activation = 'relu', padding = 'same', name = 'conv8_1')(upsp3)
    conv8 = Conv2D(256, 3, activation = 'relu', padding = 'same', name = 'conv8_2')(conv8)
    conv8 = Conv2D(256, 3, activation = 'relu', padding = 'same', name = 'conv8_3')(conv8)

    upsp4 = UpSampling2D(size = (2,2), name = 'upsp4')(conv8)
    conv9 = Conv2D(128, 3, activation = 'relu', padding = 'same', name = 'conv9_1')(upsp4)
    conv9 = Conv2D(128, 3, activation = 'relu', padding = 'same', name = 'conv9_2')(conv9)

    upsp5 = UpSampling2D(size = (2,2), name = 'upsp5')(conv9)
    conv10 = Conv2D(64, 3, activation = 'relu', padding = 'same', name = 'conv10_1')(upsp5)
    conv10 = Conv2D(64, 3, activation = 'relu', padding = 'same', name = 'conv10_2')(conv10)

    conv11 = Conv2D(3, 3, activation = 'relu', padding = 'same', name = 'conv11')(conv10)

    model = Model(inputs = inputs, outputs = conv11, name = 'vgg-16_encoder_decoder')

    return model

模型摘要:

Model: "vgg-16_encoder_decoder"
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
input (InputLayer)           (None, 200, 200, 1)       0
_________________________________________________________________
conv1_1 (Conv2D)             (None, 200, 200, 64)      640
_________________________________________________________________
conv1_2 (Conv2D)             (None, 200, 200, 64)      36928
_________________________________________________________________
pool_1 (MaxPooling2D)        (None, 100, 100, 64)      0
_________________________________________________________________
conv2_1 (Conv2D)             (None, 100, 100, 128)     73856
_________________________________________________________________
conv2_2 (Conv2D)             (None, 100, 100, 128)     147584
_________________________________________________________________
pool_2 (MaxPooling2D)        (None, 50, 50, 128)       0
_________________________________________________________________
conv3_1 (Conv2D)             (None, 50, 50, 256)       295168
_________________________________________________________________
conv3_2 (Conv2D)             (None, 50, 50, 256)       590080
_________________________________________________________________
conv3_3 (Conv2D)             (None, 50, 50, 256)       590080
_________________________________________________________________
pool_3 (MaxPooling2D)        (None, 25, 25, 256)       0
_________________________________________________________________
conv4_1 (Conv2D)             (None, 25, 25, 512)       1180160
_________________________________________________________________
conv4_2 (Conv2D)             (None, 25, 25, 512)       2359808
_________________________________________________________________
conv4_3 (Conv2D)             (None, 25, 25, 512)       2359808
_________________________________________________________________
pool_4 (MaxPooling2D)        (None, 12, 12, 512)       0
_________________________________________________________________
conv5_1 (Conv2D)             (None, 12, 12, 512)       2359808
_________________________________________________________________
conv5_2 (Conv2D)             (None, 12, 12, 512)       2359808
_________________________________________________________________
conv5_3 (Conv2D)             (None, 12, 12, 512)       2359808
_________________________________________________________________
pool_5 (MaxPooling2D)        (None, 6, 6, 512)         0
_________________________________________________________________
upsp1 (UpSampling2D)         (None, 12, 12, 512)       0
_________________________________________________________________
conv6_1 (Conv2D)             (None, 12, 12, 512)       2359808
_________________________________________________________________
conv6_2 (Conv2D)             (None, 12, 12, 512)       2359808
_________________________________________________________________
conv6_3 (Conv2D)             (None, 12, 12, 512)       2359808
_________________________________________________________________
upsp2 (UpSampling2D)         (None, 24, 24, 512)       0
_________________________________________________________________
conv7_1 (Conv2D)             (None, 24, 24, 512)       2359808
_________________________________________________________________
conv7_2 (Conv2D)             (None, 24, 24, 512)       2359808
_________________________________________________________________
conv7_3 (Conv2D)             (None, 24, 24, 512)       2359808
_________________________________________________________________
upsp3 (UpSampling2D)         (None, 48, 48, 512)       0
_________________________________________________________________
conv8_1 (Conv2D)             (None, 48, 48, 256)       1179904
_________________________________________________________________
conv8_2 (Conv2D)             (None, 48, 48, 256)       590080
_________________________________________________________________
conv8_3 (Conv2D)             (None, 48, 48, 256)       590080
_________________________________________________________________
upsp4 (UpSampling2D)         (None, 96, 96, 256)       0
_________________________________________________________________
conv9_1 (Conv2D)             (None, 96, 96, 128)       295040
_________________________________________________________________
conv9_2 (Conv2D)             (None, 96, 96, 128)       147584
_________________________________________________________________
upsp5 (UpSampling2D)         (None, 192, 192, 128)     0
_________________________________________________________________
conv10_1 (Conv2D)            (None, 192, 192, 64)      73792
_________________________________________________________________
conv10_2 (Conv2D)            (None, 192, 192, 64)      36928
_________________________________________________________________
conv11 (Conv2D)              (None, 192, 192, 3)       1731
=================================================================
Total params: 31,787,523
Trainable params: 31,787,523
Non-trainable params: 0
_________________________________________________________________

最后一个卷积层 return 的形状为 (192, 192, 3) 但我需要 return 形状为 (200, 200, 1) 的图像。

我想我可以用这个改变最后一个卷积层来得到一个单通道图像:

conv11 = Conv2D(1, 3, activation = 'relu', padding = 'same', name = 'conv11')(conv10)

但我不知道这是否正确,因为我一直在阅读有关 VGG16 网络的内容,它适用于 3 通道图像。

我可以将 VGG16 用于单通道图像吗?

你读到的关于 VGG 适用于三通道 (RGB) 图像的内容仅适用于预训练模型,该模型在 ImageNet 数据集上训练并且仅包含彩色图像。由于您没有使用预训练模型,因此不受此限制。

因此您可以使用一个、三个或任意数量的输入或输出通道。