为什么 huggingface 的 FlauBERT 模型中没有 pooler 层？

Why is there no pooler layer in huggingfaces' FlauBERT model?

用于语言模型和序列分类的 BERT 模型在最后一个转换器和分类层之间包含一个额外的投影层（它包含一个大小为 hidden_dim x hidden_dim 的线性层、一个 dropout 层和一个 tanh激活）。这在最初的论文中没有描述，但得到了澄清 here。这个中间层与其余的变压器一起预训练。

在huggingface的BertModel中，这一层叫做pooler。

根据the paper，FlauBERT模型（在法语语料库上微调的XLMModel）也包括这个pooler层：“分类头由以下层组成，顺序为：dropout，linear，tanhactivation，辍学和线性。”。然而，当加载带有 huggingface 的 FlauBERT 模型时（例如，FlaubertModel.from_pretrained(...)，或FlaubertForSequenceClassification.from_pretrained(...)），模型似乎不包含这样的层。

问题由此而来：为什么 huggingfaces 的 FlauBERT 模型中没有 pooler 层？

因为 Flaubert 是 XLM 模型而不是 BERT 模型

下一个句子分类任务需要Pooler。此任务已从 Flaubert 训练中删除，使 Pooler 成为可选层。 HuggingFace 评论说“pooler 的输出通常不是对输入语义内容的很好总结，你通常最好对整个输入序列的 hidden-states 序列进行平均或池化”。因此我相信他们决定移除该层。

为什么 huggingface 的 FlauBERT 模型中没有 pooler 层？

Why is there no pooler layer in huggingfaces' FlauBERT model?

bert-language-model

huggingface-transformers