Lasagne 使用图像输入作为 LSTMLayer 的初始隐藏状态

Question

我正在做一个关于图像字幕的项目。我想在 Lasagne (theano) 中设置一批形状为 (batch_size, 512) 的图像特征作为 LSTMLayer 的初始隐藏状态。 LSTMLayer 的序列输入是一批 shape=(batch_size, max_sequence_length, 512) 的文本序列。我注意到千层面中的 LSTMLayer 有一个 hid_init 参数。有谁知道如何将它用于千层面中的 LSTMLayer？我需要自己实现一个自定义的 LSTMLayer 吗？

Answer 1

你不需要设置h_0参数，因为h_0使用c0（看这个enter link description here并记下从h0到c0的连接），所以，你只需要设置c0 参数：

decoder = LSTMLayer(l_word_embeddings,
                num_units=LSTM_UNITS,
                cell_init=your_image_features_layer_512_shape, #this is c0
                mask_input=l_mask)

您可以将 c0 设置为图层或其他数组（参见千层面 LSTM 文档 enter link description here）。

准备进一步讨论。

Lasagne 使用图像输入作为 LSTMLayer 的初始隐藏状态

Lasagne use image inputs as the initial hidden state of a LSTMLayer

theano

deep-learning

lasagne

theano-cuda