Windows 对话的语音识别
Windows Speech Recognition for conversations
如何在 UWP 应用程序中使用 Windows 语音识别进行对话?目前,它只能识别我的声音,不能识别同一对话中其他说话者的声音。也许,是否存在另一个 API?
这是我的原始代码:
public sealed partial class MainPage : Page
{
private SpeechRecognizer speechRecognizer;
public MainPage()
{
this.InitializeComponent();
this.speechRecognizer = new SpeechRecognizer();
Init();
}
private async void Init()
{
//Compile predifined grammar
SpeechRecognitionCompilationResult result = await speechRecognizer.CompileConstraintsAsync();
speechRecognizer.ContinuousRecognitionSession.ResultGenerated += ContinuousRecognitionSession_ResultGenerated;
if (speechRecognizer.State == SpeechRecognizerState.Idle)
{
await speechRecognizer.ContinuousRecognitionSession.StartAsync();
}
}
private async void ContinuousRecognitionSession_ResultGenerated(SpeechContinuousRecognitionSession sender,
SpeechContinuousRecognitionResultGeneratedEventArgs args)
{
Console.WriteLine(args.Result.Text);
}
}
SpeechRecognizer
不专门识别人的声音。它会分析当前输入的声音,并通过相应的语法将其转换为文本输出。
所以SpeechRecognizer
最理想的对话场景是两个人离麦克风不远,说话清晰,同时只有一个人在说话。
SpeechRecognizer
提供的是简单的语音识别服务,没有提供API分离两个人的声音并分别识别输出。
如何在 UWP 应用程序中使用 Windows 语音识别进行对话?目前,它只能识别我的声音,不能识别同一对话中其他说话者的声音。也许,是否存在另一个 API?
这是我的原始代码:
public sealed partial class MainPage : Page
{
private SpeechRecognizer speechRecognizer;
public MainPage()
{
this.InitializeComponent();
this.speechRecognizer = new SpeechRecognizer();
Init();
}
private async void Init()
{
//Compile predifined grammar
SpeechRecognitionCompilationResult result = await speechRecognizer.CompileConstraintsAsync();
speechRecognizer.ContinuousRecognitionSession.ResultGenerated += ContinuousRecognitionSession_ResultGenerated;
if (speechRecognizer.State == SpeechRecognizerState.Idle)
{
await speechRecognizer.ContinuousRecognitionSession.StartAsync();
}
}
private async void ContinuousRecognitionSession_ResultGenerated(SpeechContinuousRecognitionSession sender,
SpeechContinuousRecognitionResultGeneratedEventArgs args)
{
Console.WriteLine(args.Result.Text);
}
}
SpeechRecognizer
不专门识别人的声音。它会分析当前输入的声音,并通过相应的语法将其转换为文本输出。
所以SpeechRecognizer
最理想的对话场景是两个人离麦克风不远,说话清晰,同时只有一个人在说话。
SpeechRecognizer
提供的是简单的语音识别服务,没有提供API分离两个人的声音并分别识别输出。