我们如何在堆叠 LSTM 模型中使用 Bahdanau 注意力？

Question

我的目标是在堆叠式 LSTM 模型中使用注意力，但我不知道如何在编码器和解码器层之间添加 Keras 的 AdditiveAttention 机制。比方说，我们有一个输入层、一个编码器、一个解码器和一个密集分类层，我们的目标是让解码器关注编码器的所有隐藏状态 (h = [h1, ..., hT] ) 推导其输出。我可以使用 Keras 进行任何高级编码吗？例如，

input_layer = Input(shape=(T, f))
x = input_layer  
x = LSTM(num_neurons1, return_sequences=True)(x)
# Adding attention here, but I don't know how?
x = LSTM(num_neurons2)(x)
output_layer = Dense(1, 'sigmoid')(x)
model = Model(input_layer, output_layer)
...

我认为这样使用是错误的：x = AdditiveAttention(x, x)。我说得对吗？

Answer 1

也许它对您的问题有帮助？

这是一个 class化模型，具有 LSTM 和对 class化的关注字符级别：

首先为注意力创建一个自定义图层： class 注意（层）： def init(self,**kwargs): 超级（注意，自我）.init(**kwargs)

def build(self,input_shape):
    self.W=self.add_weight(name='attention_weight', shape=(input_shape[-1],1), 
                           initializer='random_normal', trainable=True)
    self.b=self.add_weight(name='attention_bias', shape=(input_shape[1],1), 
                           initializer='zeros', trainable=True)        
    super(attention, self).build(input_shape)

def call(self,x):
    # Alignment scores. Pass them through tanh function
    e = K.tanh(K.dot(x,self.W)+self.b)
    # Remove dimension of size 1
    e = K.squeeze(e, axis=-1)   
    # Compute the weights
    alpha = K.softmax(e)
    # Reshape to tensorFlow format
    alpha = K.expand_dims(alpha, axis=-1)
    # Compute the context vector
    context = x * alpha
    context = K.sum(context, axis=1)
    return context

LEN_CHA = 64 # 字符数 LEN_Input = 110 # 取决于最长的句子，用零填充

def LSTM_model_attention(Labels=3):
model = Sequential()
model.add(Embedding(LEN_CHA, EMBEDDING_DIM, input_length=LEN_INPUT))
model.add(SpatialDropout1D(0.7))
model.add(Bidirectional(LSTM(256, return_sequences=True)))
model.add(attention())
model.add(Dense(Labels, activation='softmax'))
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['acc'])
return model

LSTM_attention = LSTM_model_attention() LSTM_attention.summary()

我们如何在堆叠 LSTM 模型中使用 Bahdanau 注意力？

How could we use Bahdanau attention in a stacked LSTM model?

lstm

keras

attention-model