Microsoft Cognitive SpeechRecognizer 卡住
Microsoft Cognitive SpeechRecognizer Stuck
我正在尝试使用 MS cognitive Speech SDK 在一些 wave 文件上进行语音转文本。它对某些文件运行良好,但对其他文件却卡住了。卡住,我的意思是它不会停止,直到手动取消。
我先尝试了 RecognizeOnceAsync
方法:
private static void processRecording()
{
var speechConfig = SpeechConfig.FromSubscription("mykey", "myregion");
speechConfig.SpeechRecognitionLanguage = "es-MX";
speechConfig.OutputFormat = OutputFormat.Detailed;
using (var audioStream = new PushAudioInputStream())
{
audioStream.Write(File.ReadAllBytes("myfilepath"));
using (var audioConfig = AudioConfig.FromStreamInput(audioStream))
{
using (var speechRecognizer = new SpeechRecognizer(speechConfig, audioConfig))
{
var result = speechRecognizer.RecognizeOnceAsync().Result;
switch (result.Reason)
{
case ResultReason.RecognizedSpeech:
Console.WriteLine($"RECOGNIZED: Text={result.Text}");
break;
case ResultReason.NoMatch:
Console.WriteLine($"NOMATCH: Speech could not be recognized.");
break;
case ResultReason.Canceled:
var cancellation = CancellationDetails.FromResult(result);
Console.WriteLine($"CANCELED: Reason={cancellation.Reason}, ErrorCode={cancellation.ErrorCode}, ErrorDetails={cancellation.ErrorDetails}");
break;
}
}
}
}
}
然后我得到(一分钟后):
CANCELED: Reason=Error, ErrorCode=ServiceTimeout, ErrorDetails=Timeout: no recognition result received SessionId: 322853a3085d41ec9b60ee940531038c
然后我尝试使用 StartContinuousRecognitionAsync
:
private async static Task processRecordingsAsync()
{
var speechConfig = SpeechConfig.FromSubscription("mykey", "myregion");
speechConfig.SpeechRecognitionLanguage = "es-MX";
speechConfig.OutputFormat = OutputFormat.Detailed;
var waiter = new System.Threading.ManualResetEvent(false);
var audioStream = new PushAudioInputStream();
audioStream.Write(File.ReadAllBytes("myfilepath"));
var audioConfig = AudioConfig.FromStreamInput(audioStream);
var speechRecognizer = new SpeechRecognizer(speechConfig, audioConfig);
Action cleanup = () =>
{
waiter.Set();
try { speechRecognizer.Dispose(); } catch { }
try { audioConfig.Dispose(); } catch { }
try { audioStream.Dispose(); } catch { }
};
speechRecognizer.Recognizing += (sender, e) => Console.WriteLine($"Recognizing: {e.Result.Text}");
speechRecognizer.SessionStarted += (sender, e) => Console.WriteLine($"Recognize session started");
speechRecognizer.SessionStopped += (sender, e) => Console.WriteLine($"Recognize session stopped");
speechRecognizer.SpeechEndDetected += (sender, e) => Console.WriteLine($"Speech end detected");
speechRecognizer.SpeechStartDetected += (sender, e) => Console.WriteLine($"Speech start detected");
speechRecognizer.Recognized += (sender, e) =>
{
if (e.Result.Reason == ResultReason.RecognizedSpeech)
{
Console.WriteLine($"Recognized text: {e.Result.Text}");
}
else
{
Console.WriteLine($"Could not recognize text: {e.Result.Reason}");
}
cleanup();
};
speechRecognizer.Canceled += (sender, e) =>
{
Console.WriteLine($"Error trying to recognize text: Reason = {e.Reason}, ErrorCode = {e.ErrorCode}, ErrorDetails = {e.ErrorDetails}");
cleanup();
};
await speechRecognizer.StartContinuousRecognitionAsync();
if (!waiter.WaitOne(60000))
{
await speechRecognizer.StopContinuousRecognitionAsync();
}
}
然后我得到:
Recognize session started
Speech start detected
Recognizing: con el
Recognizing: con el servicio de tele
Recognizing: con el servicio de tele terapia
Recognizing: con el servicio de tele terapia de
Recognizing: con el servicio de tele terapia de tercer
Recognize session stopped
Error trying to recognize text: Reason = Error, ErrorCode = ServiceTimeout, ErrorDetails = Timeout while waiting for service to stop SessionId: e289298cf97447b89bd088a665e6c095
所以它正在执行大约 90% 的文件(大约 4 秒长)但是它卡住了并且直到我用 StopContinuousRecognitionAsync
强制它才结束。
当我在 speech studio 上试用这个文件时,它几乎可以识别完全相同的东西,但不会卡住。
请注意,我使用的是免费订阅。难道是因为那个?还有什么我可以尝试的吗?
你看到这个的原因是正在使用的音频输入流仍在耐心地“等待”更多数据被推送到它的可能性。流无法知道这是文件的完整内容,而不是正在转发刚刚被阻塞几秒钟的 real-time 输入流。如果流的末尾没有附加足够的尾随静默,那么假设的未来数据甚至可能会影响您收到的最终识别结果——这就是为什么您会看到文件末尾尚未被识别的原因(尚未最终确定)。
两个可能的修复:
- 在
PushAudioInputStream
上调用 .Close()
或写入一个空缓冲区 (.Write(new byte[0])
) 以显式标记流的结尾并允许 SDK 将内容打包而无需等待更多数据
- 如果只是文件输入,请考虑使用
AudioConfig.FromWavFileInput
以避免自己需要执行任何这些步骤。
正如一个补充说明:我不建议从源自相同对象的回调(事件)中对这些 SDK 对象调用 .Dispose
。如果在调用 Dispose 的回调完成后仍有未决的回调等待分派,这可能会导致一些有趣的情况。如果需要比 IDisposable
将通过有效的 using
语句提供的更迅速的处置,则在主线程上执行(例如,通过等待 TaskCompletionSource
完成时发出信号)或新任务线程 (Thread.Run(() => cleanup())
) 将避免任何潜在的拆解和事件并发问题。
我正在尝试使用 MS cognitive Speech SDK 在一些 wave 文件上进行语音转文本。它对某些文件运行良好,但对其他文件却卡住了。卡住,我的意思是它不会停止,直到手动取消。
我先尝试了 RecognizeOnceAsync
方法:
private static void processRecording()
{
var speechConfig = SpeechConfig.FromSubscription("mykey", "myregion");
speechConfig.SpeechRecognitionLanguage = "es-MX";
speechConfig.OutputFormat = OutputFormat.Detailed;
using (var audioStream = new PushAudioInputStream())
{
audioStream.Write(File.ReadAllBytes("myfilepath"));
using (var audioConfig = AudioConfig.FromStreamInput(audioStream))
{
using (var speechRecognizer = new SpeechRecognizer(speechConfig, audioConfig))
{
var result = speechRecognizer.RecognizeOnceAsync().Result;
switch (result.Reason)
{
case ResultReason.RecognizedSpeech:
Console.WriteLine($"RECOGNIZED: Text={result.Text}");
break;
case ResultReason.NoMatch:
Console.WriteLine($"NOMATCH: Speech could not be recognized.");
break;
case ResultReason.Canceled:
var cancellation = CancellationDetails.FromResult(result);
Console.WriteLine($"CANCELED: Reason={cancellation.Reason}, ErrorCode={cancellation.ErrorCode}, ErrorDetails={cancellation.ErrorDetails}");
break;
}
}
}
}
}
然后我得到(一分钟后):
CANCELED: Reason=Error, ErrorCode=ServiceTimeout, ErrorDetails=Timeout: no recognition result received SessionId: 322853a3085d41ec9b60ee940531038c
然后我尝试使用 StartContinuousRecognitionAsync
:
private async static Task processRecordingsAsync()
{
var speechConfig = SpeechConfig.FromSubscription("mykey", "myregion");
speechConfig.SpeechRecognitionLanguage = "es-MX";
speechConfig.OutputFormat = OutputFormat.Detailed;
var waiter = new System.Threading.ManualResetEvent(false);
var audioStream = new PushAudioInputStream();
audioStream.Write(File.ReadAllBytes("myfilepath"));
var audioConfig = AudioConfig.FromStreamInput(audioStream);
var speechRecognizer = new SpeechRecognizer(speechConfig, audioConfig);
Action cleanup = () =>
{
waiter.Set();
try { speechRecognizer.Dispose(); } catch { }
try { audioConfig.Dispose(); } catch { }
try { audioStream.Dispose(); } catch { }
};
speechRecognizer.Recognizing += (sender, e) => Console.WriteLine($"Recognizing: {e.Result.Text}");
speechRecognizer.SessionStarted += (sender, e) => Console.WriteLine($"Recognize session started");
speechRecognizer.SessionStopped += (sender, e) => Console.WriteLine($"Recognize session stopped");
speechRecognizer.SpeechEndDetected += (sender, e) => Console.WriteLine($"Speech end detected");
speechRecognizer.SpeechStartDetected += (sender, e) => Console.WriteLine($"Speech start detected");
speechRecognizer.Recognized += (sender, e) =>
{
if (e.Result.Reason == ResultReason.RecognizedSpeech)
{
Console.WriteLine($"Recognized text: {e.Result.Text}");
}
else
{
Console.WriteLine($"Could not recognize text: {e.Result.Reason}");
}
cleanup();
};
speechRecognizer.Canceled += (sender, e) =>
{
Console.WriteLine($"Error trying to recognize text: Reason = {e.Reason}, ErrorCode = {e.ErrorCode}, ErrorDetails = {e.ErrorDetails}");
cleanup();
};
await speechRecognizer.StartContinuousRecognitionAsync();
if (!waiter.WaitOne(60000))
{
await speechRecognizer.StopContinuousRecognitionAsync();
}
}
然后我得到:
Recognize session started
Speech start detected
Recognizing: con el
Recognizing: con el servicio de tele
Recognizing: con el servicio de tele terapia
Recognizing: con el servicio de tele terapia de
Recognizing: con el servicio de tele terapia de tercer
Recognize session stopped
Error trying to recognize text: Reason = Error, ErrorCode = ServiceTimeout, ErrorDetails = Timeout while waiting for service to stop SessionId: e289298cf97447b89bd088a665e6c095
所以它正在执行大约 90% 的文件(大约 4 秒长)但是它卡住了并且直到我用 StopContinuousRecognitionAsync
强制它才结束。
当我在 speech studio 上试用这个文件时,它几乎可以识别完全相同的东西,但不会卡住。
请注意,我使用的是免费订阅。难道是因为那个?还有什么我可以尝试的吗?
你看到这个的原因是正在使用的音频输入流仍在耐心地“等待”更多数据被推送到它的可能性。流无法知道这是文件的完整内容,而不是正在转发刚刚被阻塞几秒钟的 real-time 输入流。如果流的末尾没有附加足够的尾随静默,那么假设的未来数据甚至可能会影响您收到的最终识别结果——这就是为什么您会看到文件末尾尚未被识别的原因(尚未最终确定)。
两个可能的修复:
- 在
PushAudioInputStream
上调用.Close()
或写入一个空缓冲区 (.Write(new byte[0])
) 以显式标记流的结尾并允许 SDK 将内容打包而无需等待更多数据 - 如果只是文件输入,请考虑使用
AudioConfig.FromWavFileInput
以避免自己需要执行任何这些步骤。
正如一个补充说明:我不建议从源自相同对象的回调(事件)中对这些 SDK 对象调用 .Dispose
。如果在调用 Dispose 的回调完成后仍有未决的回调等待分派,这可能会导致一些有趣的情况。如果需要比 IDisposable
将通过有效的 using
语句提供的更迅速的处置,则在主线程上执行(例如,通过等待 TaskCompletionSource
完成时发出信号)或新任务线程 (Thread.Run(() => cleanup())
) 将避免任何潜在的拆解和事件并发问题。