如何将 ConvLSTM2D 与 VGG16 一起使用

How to use ConvLSTM2D with VGG16

我正在尝试使用以下代码将 convlstm2D 与 VGG16 一起使用:

video = Input(shape=(no_of_frames, img_width, img_height, channels))
cnn_base = VGG16(input_shape=(img_width, img_height, channels), weights="imagenet", 
include_top=False)
cnn_base.trainable = False
encoded_frames = TimeDistributed(cnn_base)(video)
encoded_sequence = ConvLSTM2D(64, kernel_size=(7, 7), strides=(2, 2),padding='same', 
return_sequences=True)(encoded_frames)
hidden_layer_1 = Dense(activation="relu", units=512)(encoded_sequence)
hidden_layer_2 = Dense(activation="relu", units=20)(hidden_layer_1)
outputs = Dense(2, activation="softmax")(hidden_layer_2)
model = Model([video], outputs)

运行 代码给出了以下错误消息:

Traceback (most recent call last): File "/home/vislab/PycharmProjects/Firefront/ConvLstm_Classification.py", line 75, in callbacks=[checkpoint, early]) File "/home/vislab/anaconda3/envs/Firefront/lib/python3.6/site-packages/keras/legacy/interfaces.py", line 91, in wrapper return func(*args, **kwargs) File "/home/vislab/anaconda3/envs/Firefront/lib/python3.6/site-packages/keras/engine/training.py", line 1418, in fit_generator initial_epoch=initial_epoch) File "/home/vislab/anaconda3/envs/Firefront/lib/python3.6/site-packages/keras/engine/training_generator.py", line 217, in fit_generator class_weight=class_weight) File "/home/vislab/anaconda3/envs/Firefront/lib/python3.6/site-packages/keras/engine/training.py", line 1211, in train_on_batch class_weight=class_weight) File "/home/vislab/anaconda3/envs/Firefront/lib/python3.6/site-packages/keras/engine/training.py", line 789, in _standardize_user_data exception_prefix='target') File "/home/vislab/anaconda3/envs/Firefront/lib/python3.6/site-packages/keras/engine/training_utils.py", line 128, in standardize_input_data 'with shape ' + str(data_shape)) ValueError: Error when checking target: expected time_distributed_5 to have 3 dimensions, but got array with shape (20, 2)

Process finished with exit code 134 (interrupted by signal 6: SIGABRT)

任何人都可以建议我如何继续前进。

谢谢。

您需要 return_sequences=False。您不是在对每一帧进行分类,而是在对整个视频进行分类。

然后你还需要 Flatten(仅适用于固定图像大小)或 Dense 层之前的全局池化,以消除额外的维度。