SQuAD 的 QA 答案中出现 BERT 的特殊字符是什么意思？

Question

我正在运行微调 BERT 和 ALBERT 模型以进行提问回答。而且，我正在评估这些模型在 SQuAD v2.0. I use SQuAD's official evaluation script 中的一部分问题上的性能以进行评估。

我使用 Huggingface transformers，在下面您可以找到我正在运行ning 的实际代码和示例（对于一些尝试运行 SQuAD v2.0 上的 ALBERT 微调模型):

tokenizer = AutoTokenizer.from_pretrained("ktrapeznikov/albert-xlarge-v2-squad-v2")
model = AutoModelForQuestionAnswering.from_pretrained("ktrapeznikov/albert-xlarge-v2-squad-v2")

question = "Why aren't the examples of bouregois architecture visible today?"
text = """Exceptional examples of the bourgeois architecture of the later periods were not restored by the communist authorities after the war (like mentioned Kronenberg Palace and Insurance Company Rosja building) or they were rebuilt in socialist realism style (like Warsaw Philharmony edifice originally inspired by Palais Garnier in Paris). Despite that the Warsaw University of Technology building (1899\u20131902) is the most interesting of the late 19th-century architecture. Some 19th-century buildings in the Praga district (the Vistula\u2019s right bank) have been restored although many have been poorly maintained. Warsaw\u2019s municipal government authorities have decided to rebuild the Saxon Palace and the Br\u00fchl Palace, the most distinctive buildings in prewar Warsaw."""

input_dict = tokenizer.encode_plus(question, text, return_tensors="pt")
input_ids = input_dict["input_ids"].tolist()
start_scores, end_scores = model(**input_dict)

all_tokens = tokenizer.convert_ids_to_tokens(input_ids[0])
answer = ' '.join(all_tokens[torch.argmax(start_scores) : torch.argmax(end_scores)+1]).replace('▁', '')
print(answer)

输出如下：

[CLS] why aren ' t the examples of bour ego is architecture visible today ? [SEP] exceptional examples of the  bourgeois architecture of the later periods were not restored by the communist authorities after the war

如您所见，答案中有 BERT 的特殊标记，包括 [CLS] 和 [SEP]。

我理解，如果答案只是 [CLS]（start_scores 和 end_scores 有两个 tensor(0)），这基本上意味着模型认为没有答案上下文中的问题是有道理的。在这些情况下，我只是在运行评估脚本时简单地将那个问题的答案设置为空字符串。

但是我想知道在上面的例子中，我是否应该再次假设模型找不到答案并将答案设置为空字符串？或者我应该在评估模型性能时就这样留下答案吗？

我问这个问题是因为据我所知，如果我有这样的情况作为答案，那么使用评估脚本计算的性能可能会改变（如果我错了请纠正我）并且我可能无法得到现实的答案了解这些模型的性能。

Answer 1

您应该简单地将它们视为无效，因为您试图从变量 text 预测正确的答案范围。其他一切都应该是无效的。这也是 huggingface treats 这个预测的方式：

We could hypothetically create invalid predictions, e.g., predict that the start of the span is in the question. We throw out all invalid predictions.

你还应该注意到他们使用 more sopisticated method 来获得每个问题的预测（不要问我为什么他们在他们的例子中显示 torch.argmax）。请看下面的例子：

from transformers.data.processors.squad import SquadResult, SquadExample, SquadFeatures,SquadV2Processor, squad_convert_examples_to_features
from transformers.data.metrics.squad_metrics import compute_predictions_logits, squad_evaluate

###
#your example code
###

outputs = model(**input_dict)

def to_list(tensor):
    return tensor.detach().cpu().tolist()

output = [to_list(output[0]) for output in outputs]
start_logits, end_logits = output

all_results = []
all_results.append(SquadResult(1000000000, start_logits, end_logits))

#this is the answers section from the evaluation dataset
answers = [{'text':'not restored by the communist authorities', 'answer_start':77}, {'text':'were not restored', 'answer_start':72}, {'text':'not restored by the communist authorities after the war', 'answer_start':77}]

examples = [SquadExample('0', question, text, 'not restored by the communist authorities', 75, 'Warsaw', answers,False)]

#this does basically the same as tokenizer.encode_plus() but stores them in a SquadFeatures Object and splits if neccessary
features = squad_convert_examples_to_features(examples, tokenizer, 512, 100, 64, True)

predictions = compute_predictions_logits(
            examples,
            features,
            all_results,
            20,
            30,
            True,
            'pred.file',
            'nbest_file',
            'null_log_odds_file',
            False,
            True,
            0.0,
            tokenizer
            )

result = squad_evaluate(examples, predictions)

print(predictions)
for x in result.items():
  print(x)

输出：

OrderedDict([('0', 'communist authorities after the war')])
('exact', 0.0)
('f1', 72.72727272727273)
('total', 1)
('HasAns_exact', 0.0)
('HasAns_f1', 72.72727272727273)
('HasAns_total', 1)
('best_exact', 0.0)
('best_exact_thresh', 0.0)
('best_f1', 72.72727272727273)
('best_f1_thresh', 0.0)

SQuAD 的 QA 答案中出现 BERT 的特殊字符是什么意思？

What does BERT's special characters appearance in SQuAD's QA answers mean?

question-answering

bert-language-model

squad

huggingface-transformers