huggingface 序列分类解冻层
huggingface sequence classification unfreezing layers
我正在使用 longformer 进行序列分类 - 二元问题
我已经下载了需要的文件
# load model and tokenizer and define length of the text sequence
model = LongformerForSequenceClassification.from_pretrained('allenai/longformer-base-4096',
gradient_checkpointing=False,
attention_window = 512)
tokenizer = LongformerTokenizerFast.from_pretrained('allenai/longformer-base-4096', max_length = 1024)
然后如图here我运行下面的代码
for name, param in model.named_parameters():
print(name, param.requires_grad)
longformer.embeddings.word_embeddings.weight True
longformer.embeddings.position_embeddings.weight True
longformer.embeddings.token_type_embeddings.weight True
longformer.embeddings.LayerNorm.weight True
longformer.embeddings.LayerNorm.bias True
longformer.encoder.layer.0.attention.self.query.weight True
longformer.encoder.layer.0.attention.self.query.bias True
longformer.encoder.layer.0.attention.self.key.weight True
longformer.encoder.layer.0.attention.self.key.bias True
longformer.encoder.layer.0.attention.self.value.weight True
longformer.encoder.layer.0.attention.self.value.bias True
longformer.encoder.layer.0.attention.self.query_global.weight True
longformer.encoder.layer.0.attention.self.query_global.bias True
longformer.encoder.layer.0.attention.self.key_global.weight True
longformer.encoder.layer.0.attention.self.key_global.bias True
longformer.encoder.layer.0.attention.self.value_global.weight True
longformer.encoder.layer.0.attention.self.value_global.bias True
longformer.encoder.layer.0.attention.output.dense.weight True
longformer.encoder.layer.0.attention.output.dense.bias True
longformer.encoder.layer.0.attention.output.LayerNorm.weight True
longformer.encoder.layer.0.attention.output.LayerNorm.bias True
longformer.encoder.layer.0.intermediate.dense.weight True
longformer.encoder.layer.0.intermediate.dense.bias True
longformer.encoder.layer.0.output.dense.weight True
longformer.encoder.layer.0.output.dense.bias True
longformer.encoder.layer.0.output.LayerNorm.weight True
longformer.encoder.layer.0.output.LayerNorm.bias True
longformer.encoder.layer.1.attention.self.query.weight True
longformer.encoder.layer.1.attention.self.query.bias True
longformer.encoder.layer.1.attention.self.key.weight True
longformer.encoder.layer.1.attention.self.key.bias True
longformer.encoder.layer.1.attention.self.value.weight True
longformer.encoder.layer.1.attention.self.value.bias True
longformer.encoder.layer.1.attention.self.query_global.weight True
longformer.encoder.layer.1.attention.self.query_global.bias True
longformer.encoder.layer.1.attention.self.key_global.weight True
longformer.encoder.layer.1.attention.self.key_global.bias True
longformer.encoder.layer.1.attention.self.value_global.weight True
longformer.encoder.layer.1.attention.self.value_global.bias True
longformer.encoder.layer.1.attention.output.dense.weight True
longformer.encoder.layer.1.attention.output.dense.bias True
longformer.encoder.layer.1.attention.output.LayerNorm.weight True
longformer.encoder.layer.1.attention.output.LayerNorm.bias True
longformer.encoder.layer.1.intermediate.dense.weight True
longformer.encoder.layer.1.intermediate.dense.bias True
longformer.encoder.layer.1.output.dense.weight True
longformer.encoder.layer.1.output.dense.bias True
longformer.encoder.layer.1.output.LayerNorm.weight True
longformer.encoder.layer.1.output.LayerNorm.bias True
longformer.encoder.layer.2.attention.self.query.weight True
longformer.encoder.layer.2.attention.self.query.bias True
longformer.encoder.layer.2.attention.self.key.weight True
longformer.encoder.layer.2.attention.self.key.bias True
longformer.encoder.layer.2.attention.self.value.weight True
longformer.encoder.layer.2.attention.self.value.bias True
longformer.encoder.layer.2.attention.self.query_global.weight True
longformer.encoder.layer.2.attention.self.query_global.bias True
longformer.encoder.layer.2.attention.self.key_global.weight True
longformer.encoder.layer.2.attention.self.key_global.bias True
longformer.encoder.layer.2.attention.self.value_global.weight True
longformer.encoder.layer.2.attention.self.value_global.bias True
longformer.encoder.layer.2.attention.output.dense.weight True
longformer.encoder.layer.2.attention.output.dense.bias True
longformer.encoder.layer.2.attention.output.LayerNorm.weight True
longformer.encoder.layer.2.attention.output.LayerNorm.bias True
longformer.encoder.layer.2.intermediate.dense.weight True
longformer.encoder.layer.2.intermediate.dense.bias True
longformer.encoder.layer.2.output.dense.weight True
longformer.encoder.layer.2.output.dense.bias True
longformer.encoder.layer.2.output.LayerNorm.weight True
longformer.encoder.layer.2.output.LayerNorm.bias True
longformer.encoder.layer.3.attention.self.query.weight True
longformer.encoder.layer.3.attention.self.query.bias True
longformer.encoder.layer.3.attention.self.key.weight True
longformer.encoder.layer.3.attention.self.key.bias True
longformer.encoder.layer.3.attention.self.value.weight True
longformer.encoder.layer.3.attention.self.value.bias True
longformer.encoder.layer.3.attention.self.query_global.weight True
longformer.encoder.layer.3.attention.self.query_global.bias True
longformer.encoder.layer.3.attention.self.key_global.weight True
longformer.encoder.layer.3.attention.self.key_global.bias True
longformer.encoder.layer.3.attention.self.value_global.weight True
longformer.encoder.layer.3.attention.self.value_global.bias True
longformer.encoder.layer.3.attention.output.dense.weight True
longformer.encoder.layer.3.attention.output.dense.bias True
longformer.encoder.layer.3.attention.output.LayerNorm.weight True
longformer.encoder.layer.3.attention.output.LayerNorm.bias True
longformer.encoder.layer.3.intermediate.dense.weight True
longformer.encoder.layer.3.intermediate.dense.bias True
longformer.encoder.layer.3.output.dense.weight True
longformer.encoder.layer.3.output.dense.bias True
longformer.encoder.layer.3.output.LayerNorm.weight True
longformer.encoder.layer.3.output.LayerNorm.bias True
longformer.encoder.layer.4.attention.self.query.weight True
longformer.encoder.layer.4.attention.self.query.bias True
longformer.encoder.layer.4.attention.self.key.weight True
longformer.encoder.layer.4.attention.self.key.bias True
longformer.encoder.layer.4.attention.self.value.weight True
longformer.encoder.layer.4.attention.self.value.bias True
longformer.encoder.layer.4.attention.self.query_global.weight True
longformer.encoder.layer.4.attention.self.query_global.bias True
longformer.encoder.layer.4.attention.self.key_global.weight True
longformer.encoder.layer.4.attention.self.key_global.bias True
longformer.encoder.layer.4.attention.self.value_global.weight True
longformer.encoder.layer.4.attention.self.value_global.bias True
longformer.encoder.layer.4.attention.output.dense.weight True
longformer.encoder.layer.4.attention.output.dense.bias True
longformer.encoder.layer.4.attention.output.LayerNorm.weight True
longformer.encoder.layer.4.attention.output.LayerNorm.bias True
longformer.encoder.layer.4.intermediate.dense.weight True
longformer.encoder.layer.4.intermediate.dense.bias True
longformer.encoder.layer.4.output.dense.weight True
longformer.encoder.layer.4.output.dense.bias True
longformer.encoder.layer.4.output.LayerNorm.weight True
longformer.encoder.layer.4.output.LayerNorm.bias True
longformer.encoder.layer.5.attention.self.query.weight True
longformer.encoder.layer.5.attention.self.query.bias True
longformer.encoder.layer.5.attention.self.key.weight True
longformer.encoder.layer.5.attention.self.key.bias True
longformer.encoder.layer.5.attention.self.value.weight True
longformer.encoder.layer.5.attention.self.value.bias True
longformer.encoder.layer.5.attention.self.query_global.weight True
longformer.encoder.layer.5.attention.self.query_global.bias True
longformer.encoder.layer.5.attention.self.key_global.weight True
longformer.encoder.layer.5.attention.self.key_global.bias True
longformer.encoder.layer.5.attention.self.value_global.weight True
longformer.encoder.layer.5.attention.self.value_global.bias True
longformer.encoder.layer.5.attention.output.dense.weight True
longformer.encoder.layer.5.attention.output.dense.bias True
longformer.encoder.layer.5.attention.output.LayerNorm.weight True
longformer.encoder.layer.5.attention.output.LayerNorm.bias True
longformer.encoder.layer.5.intermediate.dense.weight True
longformer.encoder.layer.5.intermediate.dense.bias True
longformer.encoder.layer.5.output.dense.weight True
longformer.encoder.layer.5.output.dense.bias True
longformer.encoder.layer.5.output.LayerNorm.weight True
longformer.encoder.layer.5.output.LayerNorm.bias True
longformer.encoder.layer.6.attention.self.query.weight True
longformer.encoder.layer.6.attention.self.query.bias True
longformer.encoder.layer.6.attention.self.key.weight True
longformer.encoder.layer.6.attention.self.key.bias True
longformer.encoder.layer.6.attention.self.value.weight True
longformer.encoder.layer.6.attention.self.value.bias True
longformer.encoder.layer.6.attention.self.query_global.weight True
longformer.encoder.layer.6.attention.self.query_global.bias True
longformer.encoder.layer.6.attention.self.key_global.weight True
longformer.encoder.layer.6.attention.self.key_global.bias True
longformer.encoder.layer.6.attention.self.value_global.weight True
longformer.encoder.layer.6.attention.self.value_global.bias True
longformer.encoder.layer.6.attention.output.dense.weight True
longformer.encoder.layer.6.attention.output.dense.bias True
longformer.encoder.layer.6.attention.output.LayerNorm.weight True
longformer.encoder.layer.6.attention.output.LayerNorm.bias True
longformer.encoder.layer.6.intermediate.dense.weight True
longformer.encoder.layer.6.intermediate.dense.bias True
longformer.encoder.layer.6.output.dense.weight True
longformer.encoder.layer.6.output.dense.bias True
longformer.encoder.layer.6.output.LayerNorm.weight True
longformer.encoder.layer.6.output.LayerNorm.bias True
longformer.encoder.layer.7.attention.self.query.weight True
longformer.encoder.layer.7.attention.self.query.bias True
longformer.encoder.layer.7.attention.self.key.weight True
longformer.encoder.layer.7.attention.self.key.bias True
longformer.encoder.layer.7.attention.self.value.weight True
longformer.encoder.layer.7.attention.self.value.bias True
longformer.encoder.layer.7.attention.self.query_global.weight True
longformer.encoder.layer.7.attention.self.query_global.bias True
longformer.encoder.layer.7.attention.self.key_global.weight True
longformer.encoder.layer.7.attention.self.key_global.bias True
longformer.encoder.layer.7.attention.self.value_global.weight True
longformer.encoder.layer.7.attention.self.value_global.bias True
longformer.encoder.layer.7.attention.output.dense.weight True
longformer.encoder.layer.7.attention.output.dense.bias True
longformer.encoder.layer.7.attention.output.LayerNorm.weight True
longformer.encoder.layer.7.attention.output.LayerNorm.bias True
longformer.encoder.layer.7.intermediate.dense.weight True
longformer.encoder.layer.7.intermediate.dense.bias True
longformer.encoder.layer.7.output.dense.weight True
longformer.encoder.layer.7.output.dense.bias True
longformer.encoder.layer.7.output.LayerNorm.weight True
longformer.encoder.layer.7.output.LayerNorm.bias True
longformer.encoder.layer.8.attention.self.query.weight True
longformer.encoder.layer.8.attention.self.query.bias True
longformer.encoder.layer.8.attention.self.key.weight True
longformer.encoder.layer.8.attention.self.key.bias True
longformer.encoder.layer.8.attention.self.value.weight True
longformer.encoder.layer.8.attention.self.value.bias True
longformer.encoder.layer.8.attention.self.query_global.weight True
longformer.encoder.layer.8.attention.self.query_global.bias True
longformer.encoder.layer.8.attention.self.key_global.weight True
longformer.encoder.layer.8.attention.self.key_global.bias True
longformer.encoder.layer.8.attention.self.value_global.weight True
longformer.encoder.layer.8.attention.self.value_global.bias True
longformer.encoder.layer.8.attention.output.dense.weight True
longformer.encoder.layer.8.attention.output.dense.bias True
longformer.encoder.layer.8.attention.output.LayerNorm.weight True
longformer.encoder.layer.8.attention.output.LayerNorm.bias True
longformer.encoder.layer.8.intermediate.dense.weight True
longformer.encoder.layer.8.intermediate.dense.bias True
longformer.encoder.layer.8.output.dense.weight True
longformer.encoder.layer.8.output.dense.bias True
longformer.encoder.layer.8.output.LayerNorm.weight True
longformer.encoder.layer.8.output.LayerNorm.bias True
longformer.encoder.layer.9.attention.self.query.weight True
longformer.encoder.layer.9.attention.self.query.bias True
longformer.encoder.layer.9.attention.self.key.weight True
longformer.encoder.layer.9.attention.self.key.bias True
longformer.encoder.layer.9.attention.self.value.weight True
longformer.encoder.layer.9.attention.self.value.bias True
longformer.encoder.layer.9.attention.self.query_global.weight True
longformer.encoder.layer.9.attention.self.query_global.bias True
longformer.encoder.layer.9.attention.self.key_global.weight True
longformer.encoder.layer.9.attention.self.key_global.bias True
longformer.encoder.layer.9.attention.self.value_global.weight True
longformer.encoder.layer.9.attention.self.value_global.bias True
longformer.encoder.layer.9.attention.output.dense.weight True
longformer.encoder.layer.9.attention.output.dense.bias True
longformer.encoder.layer.9.attention.output.LayerNorm.weight True
longformer.encoder.layer.9.attention.output.LayerNorm.bias True
longformer.encoder.layer.9.intermediate.dense.weight True
longformer.encoder.layer.9.intermediate.dense.bias True
longformer.encoder.layer.9.output.dense.weight True
longformer.encoder.layer.9.output.dense.bias True
longformer.encoder.layer.9.output.LayerNorm.weight True
longformer.encoder.layer.9.output.LayerNorm.bias True
longformer.encoder.layer.10.attention.self.query.weight True
longformer.encoder.layer.10.attention.self.query.bias True
longformer.encoder.layer.10.attention.self.key.weight True
longformer.encoder.layer.10.attention.self.key.bias True
longformer.encoder.layer.10.attention.self.value.weight True
longformer.encoder.layer.10.attention.self.value.bias True
longformer.encoder.layer.10.attention.self.query_global.weight True
longformer.encoder.layer.10.attention.self.query_global.bias True
longformer.encoder.layer.10.attention.self.key_global.weight True
longformer.encoder.layer.10.attention.self.key_global.bias True
longformer.encoder.layer.10.attention.self.value_global.weight True
longformer.encoder.layer.10.attention.self.value_global.bias True
longformer.encoder.layer.10.attention.output.dense.weight True
longformer.encoder.layer.10.attention.output.dense.bias True
longformer.encoder.layer.10.attention.output.LayerNorm.weight True
longformer.encoder.layer.10.attention.output.LayerNorm.bias True
longformer.encoder.layer.10.intermediate.dense.weight True
longformer.encoder.layer.10.intermediate.dense.bias True
longformer.encoder.layer.10.output.dense.weight True
longformer.encoder.layer.10.output.dense.bias True
longformer.encoder.layer.10.output.LayerNorm.weight True
longformer.encoder.layer.10.output.LayerNorm.bias True
longformer.encoder.layer.11.attention.self.query.weight True
longformer.encoder.layer.11.attention.self.query.bias True
longformer.encoder.layer.11.attention.self.key.weight True
longformer.encoder.layer.11.attention.self.key.bias True
longformer.encoder.layer.11.attention.self.value.weight True
longformer.encoder.layer.11.attention.self.value.bias True
longformer.encoder.layer.11.attention.self.query_global.weight True
longformer.encoder.layer.11.attention.self.query_global.bias True
longformer.encoder.layer.11.attention.self.key_global.weight True
longformer.encoder.layer.11.attention.self.key_global.bias True
longformer.encoder.layer.11.attention.self.value_global.weight True
longformer.encoder.layer.11.attention.self.value_global.bias True
longformer.encoder.layer.11.attention.output.dense.weight True
longformer.encoder.layer.11.attention.output.dense.bias True
longformer.encoder.layer.11.attention.output.LayerNorm.weight True
longformer.encoder.layer.11.attention.output.LayerNorm.bias True
longformer.encoder.layer.11.intermediate.dense.weight True
longformer.encoder.layer.11.intermediate.dense.bias True
longformer.encoder.layer.11.output.dense.weight True
longformer.encoder.layer.11.output.dense.bias True
longformer.encoder.layer.11.output.LayerNorm.weight True
longformer.encoder.layer.11.output.LayerNorm.bias True
classifier.dense.weight True
classifier.dense.bias True
classifier.out_proj.weight True
classifier.out_proj.bias True
我的问题
- 为什么所有层
param.requires_grad
都是 True
?至少 classifier.
层不应该是 False
吗?我们不是在训练他们吗?
param.requires_grad
==True
是否表示特定图层已冻结?我对 requires_grad
的措辞感到困惑。是冻结的意思吗?
- 如果我想训练前面的一些层,如图 here ,我应该使用下面的代码吗?
for name, param in model.named_parameters():
if name.startswith("..."): # choose whatever you like here
param.requires_grad = False
- 考虑到训练需要很多时间,是否有关于我应该训练的层的具体建议?首先,我打算训练 -
以 longformer.encoder.layer.11.
和
开头的所有图层
`classifier.dense.weight`
`classifier.dense.bias`
`classifier.out_proj.weight`
`classifier.out_proj.bias`
- 我是否需要添加任何额外的层,例如
dropout
或者 LongformerForSequenceClassification.from_pretrained
是否已经处理好?我在上面的输出中没有看到任何丢失层,这就是为什么要问这个问题
#----------------
更新 1
我如何使用@joe32140 给出的答案中的以下代码知道哪些层被冻结了?我的猜测是除了我最初问题中显示的输出的最后 4 层之外的所有内容都被冻结了。但是有没有更简单的方法来检查呢?
for param in model.base_model.parameters():
param.requires_grad = False
requires_grad==True
意味着我们将计算这个张量的梯度,所以默认设置是我们将 train/finetune 所有层。
- 您只能通过使用
冻结编码器来训练输出层
for param in model.base_model.parameters():
param.requires_grad = False
是的,huggingface输出层实现中使用了dropout。看这里:https://github.com/huggingface/transformers/blob/198c335d219a5eb4d3f124fdd1ce1a9cd9f78a9b/src/transformers/models/longformer/modeling_longformer.py#L1938
至于update 1,是的,base_model
是指不包括输出分类头的层。然而,它实际上是两层而不是四层,每层都有一个权重和一个偏置张量。
我正在使用 longformer 进行序列分类 - 二元问题
我已经下载了需要的文件
# load model and tokenizer and define length of the text sequence
model = LongformerForSequenceClassification.from_pretrained('allenai/longformer-base-4096',
gradient_checkpointing=False,
attention_window = 512)
tokenizer = LongformerTokenizerFast.from_pretrained('allenai/longformer-base-4096', max_length = 1024)
然后如图here我运行下面的代码
for name, param in model.named_parameters():
print(name, param.requires_grad)
longformer.embeddings.word_embeddings.weight True
longformer.embeddings.position_embeddings.weight True
longformer.embeddings.token_type_embeddings.weight True
longformer.embeddings.LayerNorm.weight True
longformer.embeddings.LayerNorm.bias True
longformer.encoder.layer.0.attention.self.query.weight True
longformer.encoder.layer.0.attention.self.query.bias True
longformer.encoder.layer.0.attention.self.key.weight True
longformer.encoder.layer.0.attention.self.key.bias True
longformer.encoder.layer.0.attention.self.value.weight True
longformer.encoder.layer.0.attention.self.value.bias True
longformer.encoder.layer.0.attention.self.query_global.weight True
longformer.encoder.layer.0.attention.self.query_global.bias True
longformer.encoder.layer.0.attention.self.key_global.weight True
longformer.encoder.layer.0.attention.self.key_global.bias True
longformer.encoder.layer.0.attention.self.value_global.weight True
longformer.encoder.layer.0.attention.self.value_global.bias True
longformer.encoder.layer.0.attention.output.dense.weight True
longformer.encoder.layer.0.attention.output.dense.bias True
longformer.encoder.layer.0.attention.output.LayerNorm.weight True
longformer.encoder.layer.0.attention.output.LayerNorm.bias True
longformer.encoder.layer.0.intermediate.dense.weight True
longformer.encoder.layer.0.intermediate.dense.bias True
longformer.encoder.layer.0.output.dense.weight True
longformer.encoder.layer.0.output.dense.bias True
longformer.encoder.layer.0.output.LayerNorm.weight True
longformer.encoder.layer.0.output.LayerNorm.bias True
longformer.encoder.layer.1.attention.self.query.weight True
longformer.encoder.layer.1.attention.self.query.bias True
longformer.encoder.layer.1.attention.self.key.weight True
longformer.encoder.layer.1.attention.self.key.bias True
longformer.encoder.layer.1.attention.self.value.weight True
longformer.encoder.layer.1.attention.self.value.bias True
longformer.encoder.layer.1.attention.self.query_global.weight True
longformer.encoder.layer.1.attention.self.query_global.bias True
longformer.encoder.layer.1.attention.self.key_global.weight True
longformer.encoder.layer.1.attention.self.key_global.bias True
longformer.encoder.layer.1.attention.self.value_global.weight True
longformer.encoder.layer.1.attention.self.value_global.bias True
longformer.encoder.layer.1.attention.output.dense.weight True
longformer.encoder.layer.1.attention.output.dense.bias True
longformer.encoder.layer.1.attention.output.LayerNorm.weight True
longformer.encoder.layer.1.attention.output.LayerNorm.bias True
longformer.encoder.layer.1.intermediate.dense.weight True
longformer.encoder.layer.1.intermediate.dense.bias True
longformer.encoder.layer.1.output.dense.weight True
longformer.encoder.layer.1.output.dense.bias True
longformer.encoder.layer.1.output.LayerNorm.weight True
longformer.encoder.layer.1.output.LayerNorm.bias True
longformer.encoder.layer.2.attention.self.query.weight True
longformer.encoder.layer.2.attention.self.query.bias True
longformer.encoder.layer.2.attention.self.key.weight True
longformer.encoder.layer.2.attention.self.key.bias True
longformer.encoder.layer.2.attention.self.value.weight True
longformer.encoder.layer.2.attention.self.value.bias True
longformer.encoder.layer.2.attention.self.query_global.weight True
longformer.encoder.layer.2.attention.self.query_global.bias True
longformer.encoder.layer.2.attention.self.key_global.weight True
longformer.encoder.layer.2.attention.self.key_global.bias True
longformer.encoder.layer.2.attention.self.value_global.weight True
longformer.encoder.layer.2.attention.self.value_global.bias True
longformer.encoder.layer.2.attention.output.dense.weight True
longformer.encoder.layer.2.attention.output.dense.bias True
longformer.encoder.layer.2.attention.output.LayerNorm.weight True
longformer.encoder.layer.2.attention.output.LayerNorm.bias True
longformer.encoder.layer.2.intermediate.dense.weight True
longformer.encoder.layer.2.intermediate.dense.bias True
longformer.encoder.layer.2.output.dense.weight True
longformer.encoder.layer.2.output.dense.bias True
longformer.encoder.layer.2.output.LayerNorm.weight True
longformer.encoder.layer.2.output.LayerNorm.bias True
longformer.encoder.layer.3.attention.self.query.weight True
longformer.encoder.layer.3.attention.self.query.bias True
longformer.encoder.layer.3.attention.self.key.weight True
longformer.encoder.layer.3.attention.self.key.bias True
longformer.encoder.layer.3.attention.self.value.weight True
longformer.encoder.layer.3.attention.self.value.bias True
longformer.encoder.layer.3.attention.self.query_global.weight True
longformer.encoder.layer.3.attention.self.query_global.bias True
longformer.encoder.layer.3.attention.self.key_global.weight True
longformer.encoder.layer.3.attention.self.key_global.bias True
longformer.encoder.layer.3.attention.self.value_global.weight True
longformer.encoder.layer.3.attention.self.value_global.bias True
longformer.encoder.layer.3.attention.output.dense.weight True
longformer.encoder.layer.3.attention.output.dense.bias True
longformer.encoder.layer.3.attention.output.LayerNorm.weight True
longformer.encoder.layer.3.attention.output.LayerNorm.bias True
longformer.encoder.layer.3.intermediate.dense.weight True
longformer.encoder.layer.3.intermediate.dense.bias True
longformer.encoder.layer.3.output.dense.weight True
longformer.encoder.layer.3.output.dense.bias True
longformer.encoder.layer.3.output.LayerNorm.weight True
longformer.encoder.layer.3.output.LayerNorm.bias True
longformer.encoder.layer.4.attention.self.query.weight True
longformer.encoder.layer.4.attention.self.query.bias True
longformer.encoder.layer.4.attention.self.key.weight True
longformer.encoder.layer.4.attention.self.key.bias True
longformer.encoder.layer.4.attention.self.value.weight True
longformer.encoder.layer.4.attention.self.value.bias True
longformer.encoder.layer.4.attention.self.query_global.weight True
longformer.encoder.layer.4.attention.self.query_global.bias True
longformer.encoder.layer.4.attention.self.key_global.weight True
longformer.encoder.layer.4.attention.self.key_global.bias True
longformer.encoder.layer.4.attention.self.value_global.weight True
longformer.encoder.layer.4.attention.self.value_global.bias True
longformer.encoder.layer.4.attention.output.dense.weight True
longformer.encoder.layer.4.attention.output.dense.bias True
longformer.encoder.layer.4.attention.output.LayerNorm.weight True
longformer.encoder.layer.4.attention.output.LayerNorm.bias True
longformer.encoder.layer.4.intermediate.dense.weight True
longformer.encoder.layer.4.intermediate.dense.bias True
longformer.encoder.layer.4.output.dense.weight True
longformer.encoder.layer.4.output.dense.bias True
longformer.encoder.layer.4.output.LayerNorm.weight True
longformer.encoder.layer.4.output.LayerNorm.bias True
longformer.encoder.layer.5.attention.self.query.weight True
longformer.encoder.layer.5.attention.self.query.bias True
longformer.encoder.layer.5.attention.self.key.weight True
longformer.encoder.layer.5.attention.self.key.bias True
longformer.encoder.layer.5.attention.self.value.weight True
longformer.encoder.layer.5.attention.self.value.bias True
longformer.encoder.layer.5.attention.self.query_global.weight True
longformer.encoder.layer.5.attention.self.query_global.bias True
longformer.encoder.layer.5.attention.self.key_global.weight True
longformer.encoder.layer.5.attention.self.key_global.bias True
longformer.encoder.layer.5.attention.self.value_global.weight True
longformer.encoder.layer.5.attention.self.value_global.bias True
longformer.encoder.layer.5.attention.output.dense.weight True
longformer.encoder.layer.5.attention.output.dense.bias True
longformer.encoder.layer.5.attention.output.LayerNorm.weight True
longformer.encoder.layer.5.attention.output.LayerNorm.bias True
longformer.encoder.layer.5.intermediate.dense.weight True
longformer.encoder.layer.5.intermediate.dense.bias True
longformer.encoder.layer.5.output.dense.weight True
longformer.encoder.layer.5.output.dense.bias True
longformer.encoder.layer.5.output.LayerNorm.weight True
longformer.encoder.layer.5.output.LayerNorm.bias True
longformer.encoder.layer.6.attention.self.query.weight True
longformer.encoder.layer.6.attention.self.query.bias True
longformer.encoder.layer.6.attention.self.key.weight True
longformer.encoder.layer.6.attention.self.key.bias True
longformer.encoder.layer.6.attention.self.value.weight True
longformer.encoder.layer.6.attention.self.value.bias True
longformer.encoder.layer.6.attention.self.query_global.weight True
longformer.encoder.layer.6.attention.self.query_global.bias True
longformer.encoder.layer.6.attention.self.key_global.weight True
longformer.encoder.layer.6.attention.self.key_global.bias True
longformer.encoder.layer.6.attention.self.value_global.weight True
longformer.encoder.layer.6.attention.self.value_global.bias True
longformer.encoder.layer.6.attention.output.dense.weight True
longformer.encoder.layer.6.attention.output.dense.bias True
longformer.encoder.layer.6.attention.output.LayerNorm.weight True
longformer.encoder.layer.6.attention.output.LayerNorm.bias True
longformer.encoder.layer.6.intermediate.dense.weight True
longformer.encoder.layer.6.intermediate.dense.bias True
longformer.encoder.layer.6.output.dense.weight True
longformer.encoder.layer.6.output.dense.bias True
longformer.encoder.layer.6.output.LayerNorm.weight True
longformer.encoder.layer.6.output.LayerNorm.bias True
longformer.encoder.layer.7.attention.self.query.weight True
longformer.encoder.layer.7.attention.self.query.bias True
longformer.encoder.layer.7.attention.self.key.weight True
longformer.encoder.layer.7.attention.self.key.bias True
longformer.encoder.layer.7.attention.self.value.weight True
longformer.encoder.layer.7.attention.self.value.bias True
longformer.encoder.layer.7.attention.self.query_global.weight True
longformer.encoder.layer.7.attention.self.query_global.bias True
longformer.encoder.layer.7.attention.self.key_global.weight True
longformer.encoder.layer.7.attention.self.key_global.bias True
longformer.encoder.layer.7.attention.self.value_global.weight True
longformer.encoder.layer.7.attention.self.value_global.bias True
longformer.encoder.layer.7.attention.output.dense.weight True
longformer.encoder.layer.7.attention.output.dense.bias True
longformer.encoder.layer.7.attention.output.LayerNorm.weight True
longformer.encoder.layer.7.attention.output.LayerNorm.bias True
longformer.encoder.layer.7.intermediate.dense.weight True
longformer.encoder.layer.7.intermediate.dense.bias True
longformer.encoder.layer.7.output.dense.weight True
longformer.encoder.layer.7.output.dense.bias True
longformer.encoder.layer.7.output.LayerNorm.weight True
longformer.encoder.layer.7.output.LayerNorm.bias True
longformer.encoder.layer.8.attention.self.query.weight True
longformer.encoder.layer.8.attention.self.query.bias True
longformer.encoder.layer.8.attention.self.key.weight True
longformer.encoder.layer.8.attention.self.key.bias True
longformer.encoder.layer.8.attention.self.value.weight True
longformer.encoder.layer.8.attention.self.value.bias True
longformer.encoder.layer.8.attention.self.query_global.weight True
longformer.encoder.layer.8.attention.self.query_global.bias True
longformer.encoder.layer.8.attention.self.key_global.weight True
longformer.encoder.layer.8.attention.self.key_global.bias True
longformer.encoder.layer.8.attention.self.value_global.weight True
longformer.encoder.layer.8.attention.self.value_global.bias True
longformer.encoder.layer.8.attention.output.dense.weight True
longformer.encoder.layer.8.attention.output.dense.bias True
longformer.encoder.layer.8.attention.output.LayerNorm.weight True
longformer.encoder.layer.8.attention.output.LayerNorm.bias True
longformer.encoder.layer.8.intermediate.dense.weight True
longformer.encoder.layer.8.intermediate.dense.bias True
longformer.encoder.layer.8.output.dense.weight True
longformer.encoder.layer.8.output.dense.bias True
longformer.encoder.layer.8.output.LayerNorm.weight True
longformer.encoder.layer.8.output.LayerNorm.bias True
longformer.encoder.layer.9.attention.self.query.weight True
longformer.encoder.layer.9.attention.self.query.bias True
longformer.encoder.layer.9.attention.self.key.weight True
longformer.encoder.layer.9.attention.self.key.bias True
longformer.encoder.layer.9.attention.self.value.weight True
longformer.encoder.layer.9.attention.self.value.bias True
longformer.encoder.layer.9.attention.self.query_global.weight True
longformer.encoder.layer.9.attention.self.query_global.bias True
longformer.encoder.layer.9.attention.self.key_global.weight True
longformer.encoder.layer.9.attention.self.key_global.bias True
longformer.encoder.layer.9.attention.self.value_global.weight True
longformer.encoder.layer.9.attention.self.value_global.bias True
longformer.encoder.layer.9.attention.output.dense.weight True
longformer.encoder.layer.9.attention.output.dense.bias True
longformer.encoder.layer.9.attention.output.LayerNorm.weight True
longformer.encoder.layer.9.attention.output.LayerNorm.bias True
longformer.encoder.layer.9.intermediate.dense.weight True
longformer.encoder.layer.9.intermediate.dense.bias True
longformer.encoder.layer.9.output.dense.weight True
longformer.encoder.layer.9.output.dense.bias True
longformer.encoder.layer.9.output.LayerNorm.weight True
longformer.encoder.layer.9.output.LayerNorm.bias True
longformer.encoder.layer.10.attention.self.query.weight True
longformer.encoder.layer.10.attention.self.query.bias True
longformer.encoder.layer.10.attention.self.key.weight True
longformer.encoder.layer.10.attention.self.key.bias True
longformer.encoder.layer.10.attention.self.value.weight True
longformer.encoder.layer.10.attention.self.value.bias True
longformer.encoder.layer.10.attention.self.query_global.weight True
longformer.encoder.layer.10.attention.self.query_global.bias True
longformer.encoder.layer.10.attention.self.key_global.weight True
longformer.encoder.layer.10.attention.self.key_global.bias True
longformer.encoder.layer.10.attention.self.value_global.weight True
longformer.encoder.layer.10.attention.self.value_global.bias True
longformer.encoder.layer.10.attention.output.dense.weight True
longformer.encoder.layer.10.attention.output.dense.bias True
longformer.encoder.layer.10.attention.output.LayerNorm.weight True
longformer.encoder.layer.10.attention.output.LayerNorm.bias True
longformer.encoder.layer.10.intermediate.dense.weight True
longformer.encoder.layer.10.intermediate.dense.bias True
longformer.encoder.layer.10.output.dense.weight True
longformer.encoder.layer.10.output.dense.bias True
longformer.encoder.layer.10.output.LayerNorm.weight True
longformer.encoder.layer.10.output.LayerNorm.bias True
longformer.encoder.layer.11.attention.self.query.weight True
longformer.encoder.layer.11.attention.self.query.bias True
longformer.encoder.layer.11.attention.self.key.weight True
longformer.encoder.layer.11.attention.self.key.bias True
longformer.encoder.layer.11.attention.self.value.weight True
longformer.encoder.layer.11.attention.self.value.bias True
longformer.encoder.layer.11.attention.self.query_global.weight True
longformer.encoder.layer.11.attention.self.query_global.bias True
longformer.encoder.layer.11.attention.self.key_global.weight True
longformer.encoder.layer.11.attention.self.key_global.bias True
longformer.encoder.layer.11.attention.self.value_global.weight True
longformer.encoder.layer.11.attention.self.value_global.bias True
longformer.encoder.layer.11.attention.output.dense.weight True
longformer.encoder.layer.11.attention.output.dense.bias True
longformer.encoder.layer.11.attention.output.LayerNorm.weight True
longformer.encoder.layer.11.attention.output.LayerNorm.bias True
longformer.encoder.layer.11.intermediate.dense.weight True
longformer.encoder.layer.11.intermediate.dense.bias True
longformer.encoder.layer.11.output.dense.weight True
longformer.encoder.layer.11.output.dense.bias True
longformer.encoder.layer.11.output.LayerNorm.weight True
longformer.encoder.layer.11.output.LayerNorm.bias True
classifier.dense.weight True
classifier.dense.bias True
classifier.out_proj.weight True
classifier.out_proj.bias True
我的问题
- 为什么所有层
param.requires_grad
都是True
?至少classifier.
层不应该是False
吗?我们不是在训练他们吗? param.requires_grad
==True
是否表示特定图层已冻结?我对requires_grad
的措辞感到困惑。是冻结的意思吗?- 如果我想训练前面的一些层,如图 here ,我应该使用下面的代码吗?
for name, param in model.named_parameters():
if name.startswith("..."): # choose whatever you like here
param.requires_grad = False
- 考虑到训练需要很多时间,是否有关于我应该训练的层的具体建议?首先,我打算训练 -
以 longformer.encoder.layer.11.
和
`classifier.dense.weight`
`classifier.dense.bias`
`classifier.out_proj.weight`
`classifier.out_proj.bias`
- 我是否需要添加任何额外的层,例如
dropout
或者LongformerForSequenceClassification.from_pretrained
是否已经处理好?我在上面的输出中没有看到任何丢失层,这就是为什么要问这个问题
#---------------- 更新 1
我如何使用@joe32140 给出的答案中的以下代码知道哪些层被冻结了?我的猜测是除了我最初问题中显示的输出的最后 4 层之外的所有内容都被冻结了。但是有没有更简单的方法来检查呢?
for param in model.base_model.parameters():
param.requires_grad = False
requires_grad==True
意味着我们将计算这个张量的梯度,所以默认设置是我们将 train/finetune 所有层。- 您只能通过使用 冻结编码器来训练输出层
for param in model.base_model.parameters():
param.requires_grad = False
是的,huggingface输出层实现中使用了dropout。看这里:https://github.com/huggingface/transformers/blob/198c335d219a5eb4d3f124fdd1ce1a9cd9f78a9b/src/transformers/models/longformer/modeling_longformer.py#L1938
至于update 1,是的,
base_model
是指不包括输出分类头的层。然而,它实际上是两层而不是四层,每层都有一个权重和一个偏置张量。