Microsoft.CognitiveServices.Speech.DetailedSpeechRecognitionResultCollection 错误
Microsoft.CognitiveServices.Speech.DetailedSpeechRecognitionResultCollection error
我们正在使用 Microsoft 认知服务试验语音转文本。我们的要求之一是具有单词级别的时间戳。这适用于短 wav 文件,例如 2-3 分钟的音频,但对于较大的文件,我们会收到错误消息:
"There was an error deserializing the object of type Microsoft.CognitiveServices.Speech.DetailedSpeechRecognitionResultCollection. The value '2152200000' cannot be parsed as the type 'Int32'."
任何关于我如何解决这个问题的提示都将不胜感激。提前致谢!
代码片段:
config.OutputFormat = OutputFormat.Detailed;
config.RequestWordLevelTimestamps();
using (var audioInput = AudioConfig.FromWavFileInput(wavfile))
{
using var recognizer = new SpeechRecognizer(config, audioInput);
recognizer.Recognized += (s, e) =>
{
if (e.Result.Reason == ResultReason.RecognizedSpeech)
{
var framesStart = TimeSpan.FromTicks(e.Result.OffsetInTicks).TotalMilliseconds / 40;
var te = new TranscriptElement((long)framesStart, e.Result.Text, languageCode);
// Eventually fails on the following line:
var words = e.Result.Best().OrderByDescending(x => x.Confidence).First().Words;
foreach (var w in words.OrderBy(w => w.Offset))
{
var start = TimeSpan.FromTicks(w.Offset).TotalMilliseconds / 40;
var duration = TimeSpan.FromTicks(w.Duration).TotalMilliseconds / 40;
te.SingleWords.Add(new TranscriptSingleWord((long)start, (long)(start + duration), w.Word));
}
transcriptElements.Add(te);
}
else if (e.Result.Reason == ResultReason.NoMatch)
{
_logger.LogError($"NOMATCH: Speech could not be recognized.");
}
};
await recognizer.StartContinuousRecognitionAsync().ConfigureAwait(false);
Task.WaitAny(new[] { stopRecognition.Task });
await recognizer.StopContinuousRecognitionAsync().ConfigureAwait(false);
}
这是扩展用于偏移量的数据类型中的错误。一个 int 只能跟踪 ~214s 的音频。
您可以通过 SpeechServiceResponse_JsonResult
属性 从结果的 属性 集合中访问 Best() 方法正在使用的原始 JSON,直到修复可用。
我们正在使用 Microsoft 认知服务试验语音转文本。我们的要求之一是具有单词级别的时间戳。这适用于短 wav 文件,例如 2-3 分钟的音频,但对于较大的文件,我们会收到错误消息: "There was an error deserializing the object of type Microsoft.CognitiveServices.Speech.DetailedSpeechRecognitionResultCollection. The value '2152200000' cannot be parsed as the type 'Int32'."
任何关于我如何解决这个问题的提示都将不胜感激。提前致谢!
代码片段:
config.OutputFormat = OutputFormat.Detailed;
config.RequestWordLevelTimestamps();
using (var audioInput = AudioConfig.FromWavFileInput(wavfile))
{
using var recognizer = new SpeechRecognizer(config, audioInput);
recognizer.Recognized += (s, e) =>
{
if (e.Result.Reason == ResultReason.RecognizedSpeech)
{
var framesStart = TimeSpan.FromTicks(e.Result.OffsetInTicks).TotalMilliseconds / 40;
var te = new TranscriptElement((long)framesStart, e.Result.Text, languageCode);
// Eventually fails on the following line:
var words = e.Result.Best().OrderByDescending(x => x.Confidence).First().Words;
foreach (var w in words.OrderBy(w => w.Offset))
{
var start = TimeSpan.FromTicks(w.Offset).TotalMilliseconds / 40;
var duration = TimeSpan.FromTicks(w.Duration).TotalMilliseconds / 40;
te.SingleWords.Add(new TranscriptSingleWord((long)start, (long)(start + duration), w.Word));
}
transcriptElements.Add(te);
}
else if (e.Result.Reason == ResultReason.NoMatch)
{
_logger.LogError($"NOMATCH: Speech could not be recognized.");
}
};
await recognizer.StartContinuousRecognitionAsync().ConfigureAwait(false);
Task.WaitAny(new[] { stopRecognition.Task });
await recognizer.StopContinuousRecognitionAsync().ConfigureAwait(false);
}
这是扩展用于偏移量的数据类型中的错误。一个 int 只能跟踪 ~214s 的音频。
您可以通过 SpeechServiceResponse_JsonResult
属性 从结果的 属性 集合中访问 Best() 方法正在使用的原始 JSON,直到修复可用。