针对上下文特定的嵌入微调 BERT 模型

Fine-tune a BERT model for context specific embeddigns

python
nlp
bert-language-model

我正在尝试查找有关如何训练 BERT 模型的信息，可能来自 Huggingface Transformers 库，以便它输出的嵌入与我正在使用的文本的上下文更密切相关.

但是，我能找到的所有示例都是关于为另一项任务微调模型的，例如 classification。

有人会碰巧有一个用于屏蔽标记或下一句预测的 BERT 微调模型的示例，它输出另一个根据上下文微调的原始 BERT 模型吗？

谢谢！

这是 Fine tuning a language model for masked token prediction.

上 Transformers 库中的示例

使用的模型是 BERTForLM 系列之一。这个想法是使用 TextDataset that tokenizes and breaks the text into chunks. Then use a DataCollatorForLanguageModeling to randomly mask tokens in the chunks when traing, and pass the model, the data and the collator to the Trainer 创建一个数据集来训练和评估结果。

针对上下文特定的嵌入微调 BERT 模型

Fine-tune a BERT model for context specific embeddigns

python

nlp

bert-language-model