置信度分数在语音识别中意味着什么？

What do confidence scores mean in speech recognition?

很多语音转文本服务（例如 Google 的）提供置信度分数。至少对于 Google 它在 0 和 1 之间，但显然不是特定转录正确的概率，因为替代转录的置信度加起来超过 1。此外，较高置信度的结果有时排名较低.

那么，它是什么？ 'confidence score' 在语音识别社区中是否有公认的含义？我已经看到对 minimum Bayes risk 的引用，但即使那是他们正在做的事情，这也不能很好地回答问题，因为这取决于辅助损失函数的选择。

but is clearly not the probability that a particular transcription is correct, as confidences for alternative transcriptions add up to more than 1

统计算法从不给你概率值，它们给你估计值。在某些情况下，估计可能不准确，平均而言它们更接近理想值。必须校准置信度。您可以在

中查看一些理论

语音置信度的校准认出 Dong Yu, Senior Member, IEEE, Jinyu Li, Member, IEEE, Li Deng, Fellow, IE https://www.microsoft.com/en-us/research/wp-content/uploads/2011/01/ConfidenceCalibration.pdf

Is there a recognized meaning of 'confidence score' in the speech recognition community?

不一定，每个人都使用自己的算法。从简单的贝叶斯风险（这根本不是最佳估计）到更高级的方法。真的不可能知道 Google 的作用。在Kaldi中也有一个很好算法的实现：https://github.com/kaldi-asr/kaldi/blob/master/egs/ami/s5/local/confidence_calibration.sh