Keras:如何在 LSTM 模型中显示注意力权重
Keras: How to display attention weights in LSTM model
我使用带有注意力层的 LSTM 制作了一个文本分类模型。我的模型做得很好,效果很好,但我无法显示评论(输入文本)中每个单词的注意力权重和 importance/attention。
此模型使用的代码是:
def dot_product(x, kernel):
if K.backend() == 'tensorflow':
return K.squeeze(K.dot(x, K.expand_dims(kernel)),axis=-1)
else:
return K.dot(x, kernel)
class AttentionWithContext(Layer):
"""
Attention operation, with a context/query vector, for temporal data.
"Hierarchical Attention Networks for Document Classification"
by using a context vector to assist the attention
# Input shape
3D tensor with shape: (samples, steps, features).
# Output shape
2D tensor with shape: (samples, features).
How to use:
Just put it on top of an RNN Layer (GRU/LSTM/SimpleRNN) with return_sequences=True.
The dimensions are inferred based on the output shape of the RNN.
Note: The layer has been tested with Keras 2.0.6
Example:
model.add(LSTM(64, return_sequences=True))
model.add(AttentionWithContext())
# next add a Dense layer (for classification/regression) or whatever
"""
def __init__(self,
W_regularizer=None, u_regularizer=None, b_regularizer=None,
W_constraint=None, u_constraint=None, b_constraint=None,
bias=True, **kwargs):
self.supports_masking = True
self.init = initializers.get('glorot_uniform')
self.W_regularizer = regularizers.get(W_regularizer)
self.u_regularizer = regularizers.get(u_regularizer)
self.b_regularizer = regularizers.get(b_regularizer)
self.W_constraint = constraints.get(W_constraint)
self.u_constraint = constraints.get(u_constraint)
self.b_constraint = constraints.get(b_constraint)
self.bias = bias
super(AttentionWithContext, self).__init__(**kwargs)
def build(self, input_shape):
assert len(input_shape) == 3
self.W = self.add_weight((input_shape[-1], input_shape[-1],),
initializer=self.init,
name='{}_W'.format(self.name),
regularizer=self.W_regularizer,
constraint=self.W_constraint)
if self.bias:
self.b = self.add_weight((input_shape[-1],),
initializer='zero',
name='{}_b'.format(self.name),
regularizer=self.b_regularizer,
constraint=self.b_constraint)
self.u = self.add_weight((input_shape[-1],),
initializer=self.init,
name='{}_u'.format(self.name),
regularizer=self.u_regularizer,
constraint=self.u_constraint)
super(AttentionWithContext, self).build(input_shape)
def compute_mask(self, input, input_mask=None):
# do not pass the mask to the next layers
return None
def call(self, x, mask=None):
uit = dot_product(x, self.W)
if self.bias:
uit += self.b
uit = K.tanh(uit)
ait = dot_product(uit, self.u)
a = K.exp(ait)
# apply mask after the exp. will be re-normalized next
if mask is not None:
# Cast the mask to floatX to avoid float64 upcasting in theano
a *= K.cast(mask, K.floatx())
# in some cases especially in the early stages of training the sum may be almost zero
# and this results in NaN's. A workaround is to add a very small positive number ε to the sum.
# a /= K.cast(K.sum(a, axis=1, keepdims=True), K.floatx())
a /= K.cast(K.sum(a, axis=1, keepdims=True) + K.epsilon(), K.floatx())
a = K.expand_dims(a)
weighted_input = x * a
return K.sum(weighted_input, axis=1)
def compute_output_shape(self, input_shape):
return input_shape[0], input_shape[-1]
EMBEDDING_DIM=100
max_seq_len=118
bach_size = 256
num_epochs=50
from keras.models import Model
from keras.layers import Dense, Embedding, Input
from keras.layers import LSTM, Bidirectional, Dropout
def BidLstm():
#inp = Input(shape=(118,100))
#x = Embedding(max_features, embed_size, weights=[embedding_matrix],
#trainable=False)(inp)
model1=Sequential()
model1.add(Dense(512,input_shape=(118,100)))
model1.add(Activation('relu'))
#model1.add(Flatten())
#model1.add(BatchNormalization(input_shape=(100,)))
model1.add(Bidirectional(LSTM(100, activation="relu",return_sequences=True)))
model1.add(Dropout(0.1))
model1.add(TimeDistributed(Dense(200)))
model1.add(AttentionWithContext())
model1.add(Dropout(0.25))
model1.add(Dense(4, activation="softmax"))
model1.compile(loss='sparse_categorical_crossentropy', optimizer='adam',
metrics=['accuracy'])
model1.summary()
return model1
您可以使用自定义层的 get_weights()
方法来获取所有权重的列表。您可以找到更多信息 here。
您需要在模型创建期间对代码进行以下修改:
model1.add(TimeDistributed(Dense(200)))
atn = AttentionWithContext()
model1.add(atn)
然后,在训练之后,只需使用:
atn.get_weights()[index]
将权重矩阵 W
提取为 numpy
数组(我认为 index
应该设置为 0
,但你必须在你的自己的)。然后就可以用pyplot
的imshow
/matshow
method来显示矩阵了
感谢您的编辑。
您的解决方案 return 注意层的权重,但我正在寻找权重这个词。
我找到了这个问题的其他解决方案:
1.define 计算注意力权重的函数:
def cal_att_weights(output, att_w):
#if model_name == 'HAN':
eij = np.tanh(np.dot(output[0], att_w[0]) + att_w[1])
eij = np.dot(eij, att_w[2])
eij = eij.reshape((eij.shape[0], eij.shape[1]))
ai = np.exp(eij)
weights = ai / np.sum(ai)
return weights
from keras import backend as K
sent_before_att = K.function([model1.layers[0].input,K.learning_phase()], [model1.layers[2].output])
sent_att_w = model1.layers[5].get_weights()
test_seq=np.array(userinp)
test_seq=np.array(test_seq).reshape(1,118,100)
out = sent_before_att([test_seq, 0])
请在此处查看 github 存储库:https://github.com/FlorisHoogenboom/keras-han-for-docla
首先明确定义注意力层的权重计算
其次提取前面层的output和attention layer的权重然后乘以单词attentive weights
看完上面的综合回答,我终于明白如何提取注意力层的权重了。总的来说,@李想和@Okorimi Manoury的想法都是正确的。对于@Okorimi Manoury 的代码段,来自以下link:Textual attention visualization。
现在,让我一步步解释这个过程:
(1)。你应该有一个训练有素的模型,你需要加载模型并提取注意力层的权重。
要提取某些层权重,您可以使用 model.summary()
检查模型架构。然后,您可以使用:
layer_weights = model.layers[3].get_weights() #suppose your attention layer is the third layer
layer_weights
是一个列表,比如[=27=的word-level attention],layer_weights
的列表有三个元素:W,b,u。
换句话说,layer_weights[0] = W, layer_weights[1] = b, and layer_weights[2] = u
.
(2)。您还需要在注意层之前获取层输出。在这个例子中,我们需要得到第二层输出。您可以使用以下代码来完成:
new_model = Model(inputs=model.input, outputs=model.layers[2].output)
output_before_att = new_model.predict(x_test_sample) #extract layer output
(3)。查看详情:假设你输入的是一段100个词,300维的文本片段(输入为[100, 300]),经过第二层,维数为128,那么output_before_att
的shape为[ 100, 128]。相应地,layer_weights[0]
(W)为[128, 128],layer_weights[1]
(b)为[1, 128],layer_weights[2]
(u)为[1,128]。然后,我们需要以下代码:
eij = np.tanh(np.dot(output_before_att, layer_weights[0]) + layer_weights[1]) #Eq.(5) in the paper
eij = np.dot(eij, layer_weights[2]) #Eq.(6)
eij = eij.reshape((eij.shape[0], eij.shape[1])) # reshape the vector
ai = np.exp(eij) #Eq.(6)
weights = ai / np.sum(ai) # Eq.(6)
weights
是一个列表(100维),每个元素是100个输入词的注意力权重(重要性)。之后,您可以可视化注意力权重。
希望我的解释能帮到你。
我使用带有注意力层的 LSTM 制作了一个文本分类模型。我的模型做得很好,效果很好,但我无法显示评论(输入文本)中每个单词的注意力权重和 importance/attention。 此模型使用的代码是:
def dot_product(x, kernel):
if K.backend() == 'tensorflow':
return K.squeeze(K.dot(x, K.expand_dims(kernel)),axis=-1)
else:
return K.dot(x, kernel)
class AttentionWithContext(Layer):
"""
Attention operation, with a context/query vector, for temporal data.
"Hierarchical Attention Networks for Document Classification"
by using a context vector to assist the attention
# Input shape
3D tensor with shape: (samples, steps, features).
# Output shape
2D tensor with shape: (samples, features).
How to use:
Just put it on top of an RNN Layer (GRU/LSTM/SimpleRNN) with return_sequences=True.
The dimensions are inferred based on the output shape of the RNN.
Note: The layer has been tested with Keras 2.0.6
Example:
model.add(LSTM(64, return_sequences=True))
model.add(AttentionWithContext())
# next add a Dense layer (for classification/regression) or whatever
"""
def __init__(self,
W_regularizer=None, u_regularizer=None, b_regularizer=None,
W_constraint=None, u_constraint=None, b_constraint=None,
bias=True, **kwargs):
self.supports_masking = True
self.init = initializers.get('glorot_uniform')
self.W_regularizer = regularizers.get(W_regularizer)
self.u_regularizer = regularizers.get(u_regularizer)
self.b_regularizer = regularizers.get(b_regularizer)
self.W_constraint = constraints.get(W_constraint)
self.u_constraint = constraints.get(u_constraint)
self.b_constraint = constraints.get(b_constraint)
self.bias = bias
super(AttentionWithContext, self).__init__(**kwargs)
def build(self, input_shape):
assert len(input_shape) == 3
self.W = self.add_weight((input_shape[-1], input_shape[-1],),
initializer=self.init,
name='{}_W'.format(self.name),
regularizer=self.W_regularizer,
constraint=self.W_constraint)
if self.bias:
self.b = self.add_weight((input_shape[-1],),
initializer='zero',
name='{}_b'.format(self.name),
regularizer=self.b_regularizer,
constraint=self.b_constraint)
self.u = self.add_weight((input_shape[-1],),
initializer=self.init,
name='{}_u'.format(self.name),
regularizer=self.u_regularizer,
constraint=self.u_constraint)
super(AttentionWithContext, self).build(input_shape)
def compute_mask(self, input, input_mask=None):
# do not pass the mask to the next layers
return None
def call(self, x, mask=None):
uit = dot_product(x, self.W)
if self.bias:
uit += self.b
uit = K.tanh(uit)
ait = dot_product(uit, self.u)
a = K.exp(ait)
# apply mask after the exp. will be re-normalized next
if mask is not None:
# Cast the mask to floatX to avoid float64 upcasting in theano
a *= K.cast(mask, K.floatx())
# in some cases especially in the early stages of training the sum may be almost zero
# and this results in NaN's. A workaround is to add a very small positive number ε to the sum.
# a /= K.cast(K.sum(a, axis=1, keepdims=True), K.floatx())
a /= K.cast(K.sum(a, axis=1, keepdims=True) + K.epsilon(), K.floatx())
a = K.expand_dims(a)
weighted_input = x * a
return K.sum(weighted_input, axis=1)
def compute_output_shape(self, input_shape):
return input_shape[0], input_shape[-1]
EMBEDDING_DIM=100
max_seq_len=118
bach_size = 256
num_epochs=50
from keras.models import Model
from keras.layers import Dense, Embedding, Input
from keras.layers import LSTM, Bidirectional, Dropout
def BidLstm():
#inp = Input(shape=(118,100))
#x = Embedding(max_features, embed_size, weights=[embedding_matrix],
#trainable=False)(inp)
model1=Sequential()
model1.add(Dense(512,input_shape=(118,100)))
model1.add(Activation('relu'))
#model1.add(Flatten())
#model1.add(BatchNormalization(input_shape=(100,)))
model1.add(Bidirectional(LSTM(100, activation="relu",return_sequences=True)))
model1.add(Dropout(0.1))
model1.add(TimeDistributed(Dense(200)))
model1.add(AttentionWithContext())
model1.add(Dropout(0.25))
model1.add(Dense(4, activation="softmax"))
model1.compile(loss='sparse_categorical_crossentropy', optimizer='adam',
metrics=['accuracy'])
model1.summary()
return model1
您可以使用自定义层的 get_weights()
方法来获取所有权重的列表。您可以找到更多信息 here。
您需要在模型创建期间对代码进行以下修改:
model1.add(TimeDistributed(Dense(200)))
atn = AttentionWithContext()
model1.add(atn)
然后,在训练之后,只需使用:
atn.get_weights()[index]
将权重矩阵 W
提取为 numpy
数组(我认为 index
应该设置为 0
,但你必须在你的自己的)。然后就可以用pyplot
的imshow
/matshow
method来显示矩阵了
感谢您的编辑。 您的解决方案 return 注意层的权重,但我正在寻找权重这个词。
我找到了这个问题的其他解决方案:
1.define 计算注意力权重的函数:
def cal_att_weights(output, att_w):
#if model_name == 'HAN':
eij = np.tanh(np.dot(output[0], att_w[0]) + att_w[1])
eij = np.dot(eij, att_w[2])
eij = eij.reshape((eij.shape[0], eij.shape[1]))
ai = np.exp(eij)
weights = ai / np.sum(ai)
return weights
from keras import backend as K
sent_before_att = K.function([model1.layers[0].input,K.learning_phase()], [model1.layers[2].output])
sent_att_w = model1.layers[5].get_weights()
test_seq=np.array(userinp)
test_seq=np.array(test_seq).reshape(1,118,100)
out = sent_before_att([test_seq, 0])
请在此处查看 github 存储库:https://github.com/FlorisHoogenboom/keras-han-for-docla
首先明确定义注意力层的权重计算 其次提取前面层的output和attention layer的权重然后乘以单词attentive weights
看完上面的综合回答,我终于明白如何提取注意力层的权重了。总的来说,@李想和@Okorimi Manoury的想法都是正确的。对于@Okorimi Manoury 的代码段,来自以下link:Textual attention visualization。
现在,让我一步步解释这个过程:
(1)。你应该有一个训练有素的模型,你需要加载模型并提取注意力层的权重。
要提取某些层权重,您可以使用 model.summary()
检查模型架构。然后,您可以使用:
layer_weights = model.layers[3].get_weights() #suppose your attention layer is the third layer
layer_weights
是一个列表,比如[=27=的word-level attention],layer_weights
的列表有三个元素:W,b,u。
换句话说,layer_weights[0] = W, layer_weights[1] = b, and layer_weights[2] = u
.
(2)。您还需要在注意层之前获取层输出。在这个例子中,我们需要得到第二层输出。您可以使用以下代码来完成:
new_model = Model(inputs=model.input, outputs=model.layers[2].output)
output_before_att = new_model.predict(x_test_sample) #extract layer output
(3)。查看详情:假设你输入的是一段100个词,300维的文本片段(输入为[100, 300]),经过第二层,维数为128,那么output_before_att
的shape为[ 100, 128]。相应地,layer_weights[0]
(W)为[128, 128],layer_weights[1]
(b)为[1, 128],layer_weights[2]
(u)为[1,128]。然后,我们需要以下代码:
eij = np.tanh(np.dot(output_before_att, layer_weights[0]) + layer_weights[1]) #Eq.(5) in the paper
eij = np.dot(eij, layer_weights[2]) #Eq.(6)
eij = eij.reshape((eij.shape[0], eij.shape[1])) # reshape the vector
ai = np.exp(eij) #Eq.(6)
weights = ai / np.sum(ai) # Eq.(6)
weights
是一个列表(100维),每个元素是100个输入词的注意力权重(重要性)。之后,您可以可视化注意力权重。
希望我的解释能帮到你。