如何使用经过训练的 BERT 模型检查点进行预测？

Question

我用 SQUAD 2.0 训练了 BERT，并在输出目录中得到 model.ckpt.data、model.ckpt.meta、model.ckpt.index（F1 分数：81）以及 predictions.json，等使用 BERT-master/run_squad.py

python run_squad.py \
  --vocab_file=$BERT_LARGE_DIR/vocab.txt \
  --bert_config_file=$BERT_LARGE_DIR/bert_config.json \
  --init_checkpoint=$BERT_LARGE_DIR/bert_model.ckpt \
  --do_train=True \
  --train_file=$SQUAD_DIR/train-v2.0.json \
  --do_predict=True \
  --predict_file=$SQUAD_DIR/dev-v2.0.json \
  --train_batch_size=24 \
  --learning_rate=3e-5 \
  --num_train_epochs=2.0 \
  --max_seq_length=384 \
  --doc_stride=128 \
  --output_dir=gs://some_bucket/squad_large/ \
  --use_tpu=True \
  --tpu_name=$TPU_NAME \
  --version_2_with_negative=True

我尝试将 model.ckpt.meta、model.ckpt.index、model.ckpt.data 复制到 $BERT_LARGE_DIR 目录并如下更改 run_squad.py 标志以仅预测回答而不是使用数据集训练：

python run_squad.py \
  --vocab_file=$BERT_LARGE_DIR/vocab.txt \
  --bert_config_file=$BERT_LARGE_DIR/bert_config.json \
  --init_checkpoint=$BERT_LARGE_DIR/model.ckpt \
  --do_train=False \
  --train_file=$SQUAD_DIR/train-v2.0.json \
  --do_predict=True \
  --predict_file=$SQUAD_DIR/dev-v2.0.json \
  --train_batch_size=24 \
  --learning_rate=3e-5 \
  --num_train_epochs=2.0 \
  --max_seq_length=384 \
  --doc_stride=128 \
  --output_dir=gs://some_bucket/squad_large/ \
  --use_tpu=True \
  --tpu_name=$TPU_NAME \
  --version_2_with_negative=True

它抛出存储桶 directory/model.ckpt 不存在错误。

如何利用训练后生成的检查点进行预测？

Answer 1

第二个代码中的FLAGinit_checkpoint我觉得应该是：

--init_checkpoint=$BERT_LARGE_DIR/bert_model.ckpt

如上图所示，而不是 --init_checkpoint=$BERT_LARGE_DIR/model.ckpt。

如果问题仍然存在，您使用的是 multi_cased_L-12_H-768_A-12 pre-trained 型号吗？

Answer 2

通常，在训练时，在--output_dir参数指定的目录中创建训练好的检查点。（在您的情况下是 gs://some_bucket/squad_large/ ）。每个检查站都会有一个编号。你必须确定最大的数字；示例：model.ckpt-12345。现在，使用输出目录和最后保存的检查点（编号最高的模型）在 evaluation/prediction 中设置 --init_checkpoint 参数。（在你的情况下，应该是 --init_checkpoint=gs://some_bucket/squad_large/model.ckpt-<highest number>）

如何使用经过训练的 BERT 模型检查点进行预测？

How to use trained BERT model checkpoints for prediction?

python

neural-network

tensorflow

google-cloud-tpu

bert-language-model