IBM Watson Speech to text 中单词置信度的差异

Question

我正在使用 node sdk 来使用 IBM watson 语音转文本模块。发送音频样本并收到响应后，置信度看起来很奇怪。

{
  "results": [
    {
       "word_alternatives": [
      {
      "start_time": 3.31,
      "alternatives": [
        {
          "confidence": 0.7563,
          "word": "you"
        },
        {
          "confidence": 0.0254,
          "word": "look"
        },
        {
          "confidence": 0.0142,
          "word": "Lou"
        },
        {
          "confidence": 0.0118,
          "word": "we"
        }
      ],
      "end_time": 3.43
    },
...

和

...
],
"alternatives": [
    {
      "word_confidence": [
        [
          "you",
          0.36485132893469713
        ],
...

我要求使用此配置进行识别：

 var params = {
    audio: fs.createReadStream(req.file.path),
    content_type: 'audio/wav',
    'interim_results': false,
    'word_confidence': true,
    'timestamps': true,
    'max_alternatives': 3,
    'continuous': true,
    'word_alternatives_threshold': 0.01,
    'smart_formatting': true
  };

请注意单词 "you" 的置信度在两个地方有何不同。这些数字之一有什么不同吗？这是怎么回事？

Answer 1

John，"word_alternatives" 中的置信度值来自混淆网络，并且处于单词级别，而 "alternatives" 列表中的置信度值是在格上计算的，在句子层面。

混淆网络源自格，但包含假设的不同表示 space，这解释了为什么来自一个或另一个的置信度值可能不同。

在这种情况下，句子只包含一个词，这就是差异非常明显的原因。

IBM Watson Speech to text 中单词置信度的差异

Difference in word confidence in IBM Watson Speech to text

speech-recognition

speech-to-text

ibm-cloud