如何获得预训练 Distilbert 模型的 output_attentions？

Question

我正在使用预训练的 DistilBert 模型：

from transformers import TFDistilBertModel,DistilBertConfig

dbert = 'distilbert-base-uncased'

config = DistilBertConfig(max_position_embeddings=256 , dropout=0.2, 
                          attention_dropout=0.2, 
                          output_hidden_states=True,
                          output_attentions=True) #or true

dbert_model = TFDistilBertModel.from_pretrained(dbert, config)

input_ids_in = tf.keras.layers.Input(shape=(256,), name='input_id', dtype='int32')
input_masks_in = tf.keras.layers.Input(shape=(256,), name='attn_mask', dtype='int32') 

outputs = dbert_model([input_ids_in, input_masks_in], output_attentions = 1)

我正在尝试获取 output_attentions。但输出的长度为 1，给出如下：

TFBaseModelOutput([('last_hidden_state', <KerasTensor: shape=(None, 256, 768) dtype=float32 (created by layer 'tf_distil_bert_model_6')>)])

我在配置中给出了“output_attentions = True”，并在前向传递中指定了“output_attentions = 1”。谁能让我知道我做错了什么？编辑：我已将 max_positional_embeddings 的默认配置值从 512 更改为 256。如果我将模型实例化更改为

dbert_model = TFDistilBertModel.from_pretrained('distilbert-base-uncased',config=config)

它给我以下错误。

ValueError: cannot reshape array of size 393216 into shape (256,768)

768*512为393216，可能与配置代码有关

有什么想法吗？

Answer 1

我按照@cronoik 的建议发布了答案：我将代码修改为 dbert_model = TFDistilBertModel.from_pretrained('distilbert-base-uncased',config, output_attentions=True) 这在输出中给出了隐藏状态和注意力。

如何获得预训练 Distilbert 模型的 output_attentions？

How to get output_attentions of a pretrained Distilbert Model?

python

tensorflow

tf.keras

distilbert

huggingface-transformers