如何使用 VGG16 模型提取特征并将它们用作另一个模型(比如 resnet、vit-keras 等)的输入?

How to extract features using VGG16 model and use them as input for another model(say resnet, vit-keras, etc)?

我对深度学习和图像分类有点陌生。我想使用 VGG16 从图像中提取特征并将它们作为我的 vit-keras 模型的输入。以下是我的代码:

from tensorflow.keras.applications.vgg16 import VGG16
vgg_model = VGG16(include_top=False, weights = 'imagenet', input_shape=(IMAGE_SIZE, IMAGE_SIZE, 3))

for layer in vgg_model.layers:
    layer.trainable = False

from vit_keras import vit
vit_model = vit.vit_b16(
        image_size = IMAGE_SIZE,
        activation = 'sigmoid',
        pretrained = True,
        include_top = False,
        pretrained_top = False,
        classes = 2)

model = tf.keras.Sequential([
        vgg_model,
        vit_model,
        tf.keras.layers.Flatten(),
        tf.keras.layers.Dense(512, activation = tfa.activations.gelu),
        tf.keras.layers.Dense(256, activation = tfa.activations.gelu),
        tf.keras.layers.Dense(64, activation = tfa.activations.gelu),
        tf.keras.layers.BatchNormalization(),
        tf.keras.layers.Dense(1, 'sigmoid')
    ],
    name = 'vision_transformer')

model.summary()

但是,我收到以下错误:

ValueError: Input 0 of layer embedding is incompatible with the layer: expected axis -1 of input shape to have value 3 but received input with shape (None, 8, 8, 512)

我假设这个错误发生在合并 VGG16 和 vit-keras 时。对于这种情况将如何纠正此错误?

您不能将 VGG16 模型的输出提供给 vit_model,因为这两个模型都需要输入形状 (224, 224, 3) 或您定义的某些形状。问题是 VGG16 模型的输出形状为 (8, 8, 512)。您可以尝试对输出进行上采样/重塑/调整大小以适应预期的形状,但我不推荐这样做。相反,只需将相同的输入提供给两个模型,然后将它们的结果连接起来。这是一个工作示例:

import tensorflow as tf
import tensorflow_addons as tfa
from vit_keras import vit

IMAGE_SIZE = 224
vgg_model = tf.keras.applications.vgg16.VGG16(include_top=False, weights = 'imagenet', input_shape=(IMAGE_SIZE, IMAGE_SIZE, 3))
for layer in vgg_model.layers:
    layer.trainable = False

vit_model = vit.vit_b16(
        image_size = IMAGE_SIZE,
        activation = 'sigmoid',
        pretrained = True,
        include_top = False,
        pretrained_top = False,
        classes = 2)

inputs = tf.keras.layers.Input((IMAGE_SIZE, IMAGE_SIZE, 3))
vgg_output = tf.keras.layers.Flatten()(vgg_model(inputs))
vit_output = vit_model(inputs)
x = tf.keras.layers.Concatenate(axis=-1)([vgg_output, vit_output])
x = tf.keras.layers.Dense(512, activation = tfa.activations.gelu)(x)
x = tf.keras.layers.Dense(256, activation = tfa.activations.gelu)(x)
x = tf.keras.layers.Dense(64, activation = tfa.activations.gelu)(x)
x = tf.keras.layers.BatchNormalization()(x)
outputs = tf.keras.layers.Dense(1, 'sigmoid')(x)
model = tf.keras.Model(inputs, outputs)
print(model.summary())