Tensorflow 2：如何在调用方法中拟合 returns 多个值的子类模型？

Question

我通过 TensorFlow 2 中的模型子类化构建了以下模型：

from tensorflow.keras import Model, Input
from tensorflow.keras.applications import DenseNet201
from tensorflow.keras.applications.densenet import preprocess_input
from tensorflow.keras.layers import Flatten, Dense

class Detector(Model):
    
    def __init__(self, num_classes=3, name="DenseNet201"):
        super(Detector, self).__init__(name=name)
        self.feature_extractor = DenseNet201(
            include_top=False,
            weights="imagenet",
        )
        self.feature_extractor.trainable = False
        self.flatten_layer = Flatten()
        self.prediction_layer = Dense(num_classes, activation=None)

    def call(self, inputs):
        x = preprocess_input(inputs)
        extracted_feature = self.feature_extractor(x, training=False)
        x = self.flatten_layer(extracted_feature)
        y_hat = self.prediction_layer(x)
        return extracted_feature, y_hat

后续步骤是编译和拟合模型。该模型编译正常，但在拟合我的图像生成器（从 ImageDataGenerator 构建）时，我遇到了错误：InvalidArgumentError: Incompatible shapes: [64,18,18] vs. [64,1] [[node Equal (定义于：19) ]] [Op:__inference_train_function_32187] 函数调用堆栈：train_function –.

history = detector.fit(
    train_generator,
    epochs=1,
    validation_data=val_generator,
    callbacks=callbacks
)

这很明显，因为 TensorFlow 不知道在 detector.fit() 期间预测是 y_hat 还是 extracted_feature，因此抛出了错误。那么，对于我的情况，detector.fit 的正确实施是什么？

Answer 1

在此之后，您应该首先使用（比方说）一个输入和一个输出来训练您的模型。稍后如果你想计算 grad-cam，你会选择你的基础模型的一些 中间层 （不是基础模型的最终输出），在这种情况下，你需要构建您的特征提取器分开。例如

# (let's say: one input and one output)
# use for training 
base_model = keras.application(...)
x = base_model(..)
dese_drop_bn_[whatever] = x
out = dese_drop_bn_[whatever]
model = Model(base_model.input, out) 

# inference / we need to compute grad cam 
new_model = tf.keras.models.Model(model.input, 
                       [model.layers[15].output, model.output])

上面的model是用来训练的，后面推理的时候如果需要根据图层计算grad-cam，比如第15层，就需要构建new_model 具有适当的输出。希望这能让事情变得清楚。有关特征提取的更多信息，请参阅官方文档 Extract and reuse nodes in the graph of layers 2. FYI, the exact same things are happening as I informed you earlier. Also, check this official code example，您将在那里看到完全相同的内容。

不过，我认为还有另一种方法可能更适合您。也就是说，当您使用自定义模型时，我们可以在 call() 方法中使用特权 training 参数。通常在训练时间，这是 True，而在推理时间，它是 False。所以，基于此，我们可以return相应地输出想要的结果。这是完整的代码示例：

import tensorflow as tf 

# get some data
data_dir = tf.keras.utils.get_file(
    'flower_photos',
   'https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz',
    untar=True)

datagen_kwargs = dict(rescale=1./255, validation_split=.20)
dataflow_kwargs = dict(target_size=(64, 64),
                       batch_size=16,
                       interpolation="bilinear")

train_datagen = tf.keras.preprocessing.image.ImageDataGenerator(
      rotation_range=40,
      horizontal_flip=True,
      width_shift_range=0.2, height_shift_range=0.2,
      shear_range=0.2, zoom_range=0.2,
      **datagen_kwargs)

train_generator = train_datagen.flow_from_directory(
    data_dir, subset="training", shuffle=True, **dataflow_kwargs)

for image, label in train_generator:
    print(image.shape, image.dtype)
    print(label.shape, label.dtype)
    print(label[:4])
    break

(16, 64, 64, 3) float32
(16, 5) float32
[[0. 0. 0. 0. 1.]
 [0. 0. 0. 1. 0.]
 [0. 0. 0. 1. 0.]
 [0. 0. 0. 0. 1.]]

这里我们根据 call 方法中 training 的布尔值来做这个技巧。

class Detector(Model):
    def __init__(self, num_classes=5, name="DenseNet201"):
        super(Detector, self).__init__(name=name)
        self.feature_extractor = DenseNet201(
            include_top=False,
            weights="imagenet",
        )

        self.feature_extractor.trainable = False
        self.flatten_layer = Flatten()
        self.prediction_layer = Dense(num_classes, activation='softmax')

    def call(self, inputs, training):
        x = preprocess_input(inputs)
        extracted_feature = self.feature_extractor(x, training=False)
        x = self.flatten_layer(extracted_feature)
        y_hat = self.prediction_layer(x)

        if training:
            return y_hat 
        else:
            return [y_hat, extracted_feature]

火车

det = Detector()
det.compile(loss='categorical_crossentropy', 
            optimizer='adam', metrics=['acc'])

train_step = train_generator.samples // train_generator.batch_size

det.fit(train_generator, 
      steps_per_epoch=train_step,
      validation_data=train_generator, 
      validation_steps=train_step,
      epochs=2, verbose=2)

Epoch 1/2
37s 139ms/step - loss: 1.7543 - acc: 0.2650 - val_loss: 1.5310 - val_acc: 0.3764
Epoch 2/2
21s 115ms/step - loss: 1.4913 - acc: 0.3915 - val_loss: 1.3066 - val_acc: 0.4667
<tensorflow.python.keras.callbacks.History at 0x7fa2890b1790>

评价

det.evaluate(train_generator, 
      steps=train_step)

4s 76ms/step - loss: 1.3066 - acc: 0.4667
[1.3065541982650757, 0.46666666865348816]

推理

在这里，我们将获得该模型的两个输出（与我们在训练时间内获得的 1 个输出不同）。

y_hat, base_feature = det.predict(train_generator, 
                        steps=train_step)

y_hat.shape, base_feature.shape
((720, 5), (720, 2, 2, 1920))

现在，你可以做 grad-cam 或任何需要这样特征图的东西。

Tensorflow 2：如何在调用方法中拟合 returns 多个值的子类模型？

Tensorflow 2: How to fit a subclassed model that returns multiple values in the call method?

computer-vision

keras

tensorflow

tf.keras

tensorflow2.0