如何使用 Python 的 Azure SpeechService SDK 找出音频转录的可信度

How to find out the confidence level of audio transcription with the Azure SpeechService SDK for Python

我正在测试下面的代码来转录一个长音频,事实证明我需要获得每个单词的转录结果的置信度,在另一个时间可以检查转录质量.

import azure.cognitiveservices.speech as speechsdk
import time


def speech_recognize_continuous_from_file():

    """performs continuous speech recognition with input from an audio file"""
    # <SpeechContinuousRecognitionWithFile>
    speech_config = speechsdk.SpeechConfig(subscription=SUBSCRIPTION_KEY, region=REGION)
    speech_config.speech_recognition_language="pt-BR"
    audio_config = speechsdk.audio.AudioConfig(filename="file.wav")
    speech_config.enable_dictation()
    speech_config.output_format = speechsdk.OutputFormat(1)
    speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config)
    done = False

    def stop_cb(evt):
        """callback that signals to stop continuous recognition upon receiving an event `evt`"""
        print('CLOSING on {}'.format(evt))
        nonlocal done
        done = True

    # Connect callbacks to the events fired by the speech recognizer
    speech_recognizer.recognizing.connect(lambda evt: print('RECOGNIZING: {}'.format(evt)))
    speech_recognizer.recognized.connect(lambda evt: print('RECOGNIZED: {}'.format(evt)))
    speech_recognizer.session_started.connect(lambda evt: print('SESSION STARTED: {}'.format(evt)))
    speech_recognizer.session_stopped.connect(lambda evt: print('SESSION STOPPED {}'.format(evt)))
    speech_recognizer.canceled.connect(lambda evt: print('CANCELED {}'.format(evt)))
    # stop continuous recognition on either session stopped or canceled events
    speech_recognizer.session_stopped.connect(stop_cb)
    speech_recognizer.canceled.connect(stop_cb)
    # Start continuous speech recognition
    speech_recognizer.start_continuous_recognition()
    time.sleep(15)
    speech_recognizer.stop_continuous_recognition()

speech_recognize_continuous_from_file()

我知道可以使用 REST API 获得这些值,到目前为止我找不到使用 Python SDK 获得此置信度的方法。

另外我已经将 speech_recognizer output_format 更改为 'detailed' 这样我可以获得 NBest 描述,但结果是当我使用 start_continuous_recognition 方法,所有 NBest 细节都被抑制

当前详细输出有置信度分数显示。它在 csharp 中可用,但在其他语言中不可用。

https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/csharp/sharedcontent/console/speech_recognition_samples.cs#L102