使用 Pegasus 实现迁移学习以生成垃圾字符的文本摘要

Implementing Transfer Learning using Pegasus for Text Summarization generating junk characters

我一直在尝试使用 Pegasus library 并按照提到的步骤生成摘要 -

  1. pegasus\data\testdata
  2. 中创建了输入数据 .tfrecord
  3. 为 return transformer_params 创建了一个名为 test_transformers 的函数(假设)
  4. 运行 python3 pegasus/bin/train.py --params=test_transformer --param_overrides=vocab_filename=ckpt/pegasus_ckpt/c4.unigram.newline.10pct.96000.model,batch_size=1,beam_size=5,beam_alpha=0.6 --model_dir=ckpt/pegasus_ckpt/xsum/model.ckpt-30000
  5. python3 pegasus/bin/evaluate.py --params=test_transformer --param_overrides=vocab_filename=ckpt/pegasus_ckpt/c4.unigram.newline.10pct.96000.model,batch_size=1,beam_size=5,beam_alpha=0.6 --model_dir=ckpt/pegasus_ckpt/xsum/model.ckpt-30000

但是,我在生成文本时在输出中遇到了这个问题 -

它的实施方式或我在第 3 步和第 4 步中 运行 python 代码的方式是否存在问题?

提前致谢!

这是一个link to the closed issue

突出显示此问题的原因是:-

1. --model_dir is typically a directory instead of a particular checkpoint. 
   -> Try changing model_dir to actual model directory instead of checkpoint
2. It seems there are only 100 training steps. 
   -> Try changing "train_steps": 100