如何向CNN+LSTM网络添加额外的数据

How to add additional data to CNN+LSTM network

我有以下网络(预训练的 CNN + LSTM 来对视频进行分类):

 frames, channels, rows, columns = 5,3,224,224

  video = Input(shape=(frames,
                      rows,
                      columns,
                      channels))
  
  cnn_base = VGG16(input_shape=(rows,
                                columns,
                                channels),
                  weights="imagenet",
                  include_top=True) #<=== include_top=True
  cnn_base.trainable = False

  cnn = Model(cnn_base.input, cnn_base.layers[-3].output, name="VGG_fm") # -3 is the 4096 layer
  encoded_frames = TimeDistributed(cnn , name = "encoded_frames")(video)
  encoded_sequence = LSTM(256, name = "encoded_seqeunce")(encoded_frames)
  hidden_layer = Dense(1024, activation="relu" , name = "hidden_layer")(encoded_sequence)
  outputs = Dense(10, activation="softmax")(hidden_layer)

  model = Model(video, outputs)

看起来像这样:

现在,我想将视频的 784 个特征的一维向量添加到最后一层。 我试图将最后两行替换为:

  encoding_input = keras.Input(shape=(784,), name="Encoding", dtype='float') 
  sentence_features = layers.Dense(units = 60, name = 'sentence_features')(encoding_input)
  x = layers.concatenate([sentence_features, hidden_layer])
  outputs = Dense(10, activation="softmax")(x)

但出现错误:

ValueError: Graph disconnected: cannot obtain value for tensor Tensor("Sentence-Input-Encoding_3:0", shape=(None, 784), dtype=float32) at layer "sentence_features". The following previous layers were accessed without issue: ['encoded_frames', 'encoded_seqeunce']

任何建议:

您的网络现在有两个输入...不要忘记将它们都传递给您的模型

model = Model([video,encoding_input], outputs)

完整示例

frames, channels, rows, columns = 5,3,224,224

video = Input(shape=(frames,
                  rows,
                  columns,
                  channels))

cnn_base = VGG16(input_shape=(rows,
                            columns,
                            channels),
              weights="imagenet",
              include_top=True)
cnn_base.trainable = False

cnn = Model(cnn_base.input, cnn_base.layers[-3].output, name="VGG_fm")
encoded_frames = TimeDistributed(cnn , name = "encoded_frames")(video)
encoded_sequence = LSTM(256, name = "encoded_seqeunce")(encoded_frames)
hidden_layer = Dense(1024, activation="relu" , name = "hidden_layer")(encoded_sequence)

encoding_input = Input(shape=(784,), name="Encoding", dtype='float') 
sentence_features = Dense(units = 60, name = 'sentence_features')(encoding_input)
x = concatenate([sentence_features, hidden_layer])
outputs = Dense(10, activation="softmax")(x)

model = Model([video,encoding_input], outputs) #<=== double input
model.summary()