如何使用 Python 的 Azure SpeechService SDK 找出音频转录的可信度
How to find out the confidence level of audio transcription with the Azure SpeechService SDK for Python
我正在测试下面的代码来转录一个长音频,事实证明我需要获得每个单词的转录结果的置信度,在另一个时间可以检查转录质量.
import azure.cognitiveservices.speech as speechsdk
import time
def speech_recognize_continuous_from_file():
"""performs continuous speech recognition with input from an audio file"""
# <SpeechContinuousRecognitionWithFile>
speech_config = speechsdk.SpeechConfig(subscription=SUBSCRIPTION_KEY, region=REGION)
speech_config.speech_recognition_language="pt-BR"
audio_config = speechsdk.audio.AudioConfig(filename="file.wav")
speech_config.enable_dictation()
speech_config.output_format = speechsdk.OutputFormat(1)
speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config)
done = False
def stop_cb(evt):
"""callback that signals to stop continuous recognition upon receiving an event `evt`"""
print('CLOSING on {}'.format(evt))
nonlocal done
done = True
# Connect callbacks to the events fired by the speech recognizer
speech_recognizer.recognizing.connect(lambda evt: print('RECOGNIZING: {}'.format(evt)))
speech_recognizer.recognized.connect(lambda evt: print('RECOGNIZED: {}'.format(evt)))
speech_recognizer.session_started.connect(lambda evt: print('SESSION STARTED: {}'.format(evt)))
speech_recognizer.session_stopped.connect(lambda evt: print('SESSION STOPPED {}'.format(evt)))
speech_recognizer.canceled.connect(lambda evt: print('CANCELED {}'.format(evt)))
# stop continuous recognition on either session stopped or canceled events
speech_recognizer.session_stopped.connect(stop_cb)
speech_recognizer.canceled.connect(stop_cb)
# Start continuous speech recognition
speech_recognizer.start_continuous_recognition()
time.sleep(15)
speech_recognizer.stop_continuous_recognition()
speech_recognize_continuous_from_file()
我知道可以使用 REST API 获得这些值,到目前为止我找不到使用 Python SDK 获得此置信度的方法。
另外我已经将 speech_recognizer output_format 更改为 'detailed' 这样我可以获得 NBest 描述,但结果是当我使用 start_continuous_recognition 方法,所有 NBest 细节都被抑制
当前详细输出有置信度分数显示。它在 csharp 中可用,但在其他语言中不可用。
我正在测试下面的代码来转录一个长音频,事实证明我需要获得每个单词的转录结果的置信度,在另一个时间可以检查转录质量.
import azure.cognitiveservices.speech as speechsdk
import time
def speech_recognize_continuous_from_file():
"""performs continuous speech recognition with input from an audio file"""
# <SpeechContinuousRecognitionWithFile>
speech_config = speechsdk.SpeechConfig(subscription=SUBSCRIPTION_KEY, region=REGION)
speech_config.speech_recognition_language="pt-BR"
audio_config = speechsdk.audio.AudioConfig(filename="file.wav")
speech_config.enable_dictation()
speech_config.output_format = speechsdk.OutputFormat(1)
speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config)
done = False
def stop_cb(evt):
"""callback that signals to stop continuous recognition upon receiving an event `evt`"""
print('CLOSING on {}'.format(evt))
nonlocal done
done = True
# Connect callbacks to the events fired by the speech recognizer
speech_recognizer.recognizing.connect(lambda evt: print('RECOGNIZING: {}'.format(evt)))
speech_recognizer.recognized.connect(lambda evt: print('RECOGNIZED: {}'.format(evt)))
speech_recognizer.session_started.connect(lambda evt: print('SESSION STARTED: {}'.format(evt)))
speech_recognizer.session_stopped.connect(lambda evt: print('SESSION STOPPED {}'.format(evt)))
speech_recognizer.canceled.connect(lambda evt: print('CANCELED {}'.format(evt)))
# stop continuous recognition on either session stopped or canceled events
speech_recognizer.session_stopped.connect(stop_cb)
speech_recognizer.canceled.connect(stop_cb)
# Start continuous speech recognition
speech_recognizer.start_continuous_recognition()
time.sleep(15)
speech_recognizer.stop_continuous_recognition()
speech_recognize_continuous_from_file()
我知道可以使用 REST API 获得这些值,到目前为止我找不到使用 Python SDK 获得此置信度的方法。
另外我已经将 speech_recognizer output_format 更改为 'detailed' 这样我可以获得 NBest 描述,但结果是当我使用 start_continuous_recognition 方法,所有 NBest 细节都被抑制
当前详细输出有置信度分数显示。它在 csharp 中可用,但在其他语言中不可用。