如何访问Microsoft Speech SDK录制的音频流

Question

我正在使用机器人与志愿者进行对话。我正在使用 python3 和 Microsoft 的语音 SDK 来转录志愿者的回答。录音和转录都是使用 Speech SDK 完成的，我一直无法找到如何访问和保存录制的音频文件的方法。

最小代码示例：

import time
import azure.cognitiveservices.speech as speechsdk

# define callback
def handle_final_result(evt):
    global stop
    print('Heard:', evt.result.text)
    if 'stop' in evt.result.text:
        stop = True
        # TODO: somehow need to save all audio up to this point

# setup speech recognizer using microphone as input
audio_config = speechsdk.audio.AudioConfig(device_name='sysdefault:CARD=Microphone')
speech_key, service_region = "your-key-here", "your-region-here"
speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)
speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config)

# setup callback and start listening
speech_recognizer.recognized.connect(handle_final_result)
speech_recognizer.start_continuous_recognition()
stop = False
while not stop:
    time.sleep(0.2)
speech_recognizer.stop_continuous_recognition_async()

javascript 有一个类似的 post/response，但我无法使用该样本在 python3 中运行。

Answer 1

目前 Speech SDK 不提供 API 来捕获用于语音转录的麦克风音频。未来的版本将支持该功能。如果您需要访问麦克风数据，目前推荐的方法是在您的应用程序中的 Speech SDK 之外创建麦克风流，然后使用例如语音 SDK 的推送流 API，用于将音频数据馈送到语音转录。同时，您的应用能够 capture/process 满足您需要的音频。

https://docs.microsoft.com/en-us/python/api/azure-cognitiveservices-speech/azure.cognitiveservices.speech.audio.pushaudioinputstream?view=azure-python

如何访问Microsoft Speech SDK录制的音频流

How to access audio stream recorded by Microsoft Speech SDK

audio-recording

speech-to-text

python-3.x

microsoft-cognitive