Microsoft Cognitive SpeechRecognizer 卡住

Microsoft Cognitive SpeechRecognizer Stuck

我正在尝试使用 MS cognitive Speech SDK 在一些 wave 文件上进行语音转文本。它对某些文件运行良好,但对其他文件却卡住了。卡住,我的意思是它不会停止,直到手动取消。

我先尝试了 RecognizeOnceAsync 方法:

private static void processRecording()
{
    var speechConfig = SpeechConfig.FromSubscription("mykey", "myregion");
    speechConfig.SpeechRecognitionLanguage = "es-MX";
    speechConfig.OutputFormat = OutputFormat.Detailed;

    using (var audioStream = new PushAudioInputStream())
    {
        audioStream.Write(File.ReadAllBytes("myfilepath"));
        using (var audioConfig = AudioConfig.FromStreamInput(audioStream))
        {
            using (var speechRecognizer = new SpeechRecognizer(speechConfig, audioConfig))
            {
                var result = speechRecognizer.RecognizeOnceAsync().Result;
                switch (result.Reason)
                {
                    case ResultReason.RecognizedSpeech:
                        Console.WriteLine($"RECOGNIZED: Text={result.Text}");
                        break;
                    case ResultReason.NoMatch:
                        Console.WriteLine($"NOMATCH: Speech could not be recognized.");
                        break;
                    case ResultReason.Canceled:
                        var cancellation = CancellationDetails.FromResult(result);
                        Console.WriteLine($"CANCELED: Reason={cancellation.Reason}, ErrorCode={cancellation.ErrorCode}, ErrorDetails={cancellation.ErrorDetails}");
                        break;
                }
            }
        }
    }
}

然后我得到(一分钟后):

CANCELED: Reason=Error, ErrorCode=ServiceTimeout, ErrorDetails=Timeout: no recognition result received SessionId: 322853a3085d41ec9b60ee940531038c

然后我尝试使用 StartContinuousRecognitionAsync:

private async static Task processRecordingsAsync()
{
    var speechConfig = SpeechConfig.FromSubscription("mykey", "myregion");
    speechConfig.SpeechRecognitionLanguage = "es-MX";
    speechConfig.OutputFormat = OutputFormat.Detailed;

    var waiter = new System.Threading.ManualResetEvent(false);

    var audioStream = new PushAudioInputStream();
    audioStream.Write(File.ReadAllBytes("myfilepath"));
    var audioConfig = AudioConfig.FromStreamInput(audioStream);
    var speechRecognizer = new SpeechRecognizer(speechConfig, audioConfig);
    Action cleanup = () =>
    {
        waiter.Set();
        try { speechRecognizer.Dispose(); } catch { }
        try { audioConfig.Dispose(); } catch { }
        try { audioStream.Dispose(); } catch { }
    };
    speechRecognizer.Recognizing += (sender, e) => Console.WriteLine($"Recognizing: {e.Result.Text}");
    speechRecognizer.SessionStarted += (sender, e) => Console.WriteLine($"Recognize session started");
    speechRecognizer.SessionStopped += (sender, e) => Console.WriteLine($"Recognize session stopped");
    speechRecognizer.SpeechEndDetected += (sender, e) => Console.WriteLine($"Speech end detected");
    speechRecognizer.SpeechStartDetected += (sender, e) => Console.WriteLine($"Speech start detected");
    speechRecognizer.Recognized += (sender, e) =>
    {
        if (e.Result.Reason == ResultReason.RecognizedSpeech)
        {
            Console.WriteLine($"Recognized text: {e.Result.Text}");
        }
        else
        {
            Console.WriteLine($"Could not recognize text: {e.Result.Reason}");
        }
        cleanup();
    };
    speechRecognizer.Canceled += (sender, e) =>
    {
        Console.WriteLine($"Error trying to recognize text: Reason = {e.Reason}, ErrorCode = {e.ErrorCode}, ErrorDetails = {e.ErrorDetails}");
        cleanup();
    };
    await speechRecognizer.StartContinuousRecognitionAsync();
    if (!waiter.WaitOne(60000))
    {
        await speechRecognizer.StopContinuousRecognitionAsync();
    }
}

然后我得到:

Recognize session started
Speech start detected
Recognizing: con el
Recognizing: con el servicio de tele
Recognizing: con el servicio de tele terapia
Recognizing: con el servicio de tele terapia de
Recognizing: con el servicio de tele terapia de tercer
Recognize session stopped
Error trying to recognize text: Reason = Error, ErrorCode = ServiceTimeout, ErrorDetails = Timeout while waiting for service to stop SessionId: e289298cf97447b89bd088a665e6c095

所以它正在执行大约 90% 的文件(大约 4 秒长)但是它卡住了并且直到我用 StopContinuousRecognitionAsync 强制它才结束。

当我在 speech studio 上试用这个文件时,它几乎可以识别完全相同的东西,但不会卡住。

请注意,我使用的是免费订阅。难道是因为那个?还有什么我可以尝试的吗?

你看到这个的原因是正在使用的音频输入流仍在耐心地“等待”更多数据被推送到它的可能性。流无法知道这是文件的完整内容,而不是正在转发刚刚被阻塞几秒钟的 real-time 输入流。如果流的末尾没有附加足够的尾随静默,那么假设的未来数据甚至可能会影响您收到的最终识别结果——这就是为什么您会看到文件末尾尚未被识别的原因(尚未最终确定)。

两个可能的修复:

  1. PushAudioInputStream 上调用 .Close() 或写入一个空缓冲区 (.Write(new byte[0])) 以显式标记流的结尾并允许 SDK 将内容打包而无需等待更多数据
  2. 如果只是文件输入,请考虑使用 AudioConfig.FromWavFileInput 以避免自己需要执行任何这些步骤。

正如一个补充说明:我不建议从源自相同对象的回调(事件)中对这些 SDK 对象调用 .Dispose。如果在调用 Dispose 的回调完成后仍有未决的回调等待分派,这可能会导致一些有趣的情况。如果需要比 IDisposable 将通过有效的 using 语句提供的更迅速的处置,则在主线程上执行(例如,通过等待 TaskCompletionSource 完成时发出信号)或新任务线程 (Thread.Run(() => cleanup())) 将避免任何潜在的拆解和事件并发问题。