在 GRXML 中创建语音命令同义词

Question

我用 C++/CX 创建了一个语音控制的 UWP 应用程序（对于 Hololens，如果重要的话）。一个很简单的，主要是根据一些样本，这是语音识别事件处理程序：

void MyAppMain::HasSpoken(SpeechContinuousRecognitionSession ^sender, SpeechContinuousRecognitionResultGeneratedEventArgs ^args)
{
    if (args->Result->Confidence == SpeechRecognitionConfidence::Medium
        || args->Result->Confidence == SpeechRecognitionConfidence::High)
    {
        process_voice_command(args->Result->Text);
    }
}

到目前为止一切正常，识别结果在args->Result->Text变量中。现在，我只需要支持一组非常有限的语音命令并忽略其他所有内容，但在这组有限的命令中我想要一些可变性。看来，this page 上的最后一个例子就是关于这个的。所以我基于此制作了以下语法文件：

<grammar version="1.0" xml:lang="en-US" root="nextCommands" xmlns="http://www.w3.org/2001/06/grammar" tag-format="semantics/1.0">

  <rule id="nextCommands">
    <item>
      <one-of>
        <item>next</item>
        <item>go</item>        
        <item>advance</item>
      </one-of>
      <tag>out="next";</tag>
    </item>
  </rule>

</grammar>

我想要的是，当我说 "next"、"go" 或 "advance" 时，识别引擎只是 returns "next"，所以它在上面的 args->Result->Text 中。它现在对我的实际作用是将可识别的单词集限制为这三个单词，但它只是 returns 我说的单词，没有将其转换为 "next"。看起来它要么忽略 <tag> 元素，要么我必须在我的 C++/CX 程序中以不同的方式检索它的内容。或者 <tag> 并不像我认为的那样工作。我应该改变什么才能让它发挥作用？

Answer 1

Or doesn't work the way I think it does

标签是一种合法的规则扩展，标签不影响由语法定义的合法单词模式或识别语音或给定语法的其他输入的过程。详情请查看 Tags section of Speech Recognition Grammar Specification.

What I want with it is that when I say either "next", "go" or "advance", the recognition engine just returns "next"

语音识别将用户说出的单词转换为用于表单输入的文本。 Constraints，或语法，定义语音识别器可以匹配的口头单词和短语。您使用的语法用于定义比赛世界。如果你想让"next"、"go"或"advance"执行相同的命令，你可以在处理文本结果时处理它们。例如，

// Start recognition.
Windows.Media.SpeechRecognition.SpeechRecognitionResult speechRecognitionResult = await speechRecognizer.RecognizeWithUIAsync();
// Do something with the recognition result.
if (speechRecognitionResult.Text == "go" || speechRecognitionResult.Text == "next" || speechRecognitionResult.Text == "advance")
{

}

详情请参考官方示例Scenario_SRGSConstraint，其中包含方法HandleRecognitionResult。

Answer 2

我找到了一种使用 SRGS 来做我想做的事情的方法（至少对于问题中描述的非常简单的情况）。所以，似乎 <tag> 并没有直接改变识别结果（至少，不是 tag-format="semantics/1.0"，还有其他 tag-format 的，如所描述的，例如 here，他们可能会做其他事情）。相反，它会填充一些额外的属性集合。所以这就是我现在更改代码的方式：

<grammar version="1.0" xml:lang="en-US" 
root="nextCommands" xmlns="http://www.w3.org/2001/06/grammar" 
tag-format="semantics/1.0">

  <rule id="nextCommands">
    <item>
      <one-of>
        <item>next</item>
        <item>go</item>        
        <item>advance</item>
      </one-of>
      <tag>out.HONEY="bunny";</tag>
    </item>
  </rule>

</grammar>

现在，当 "next"、"go" 或 "advance" 被识别时，它仍然会转到 args->Result->Text 不变，但也会有一对新的args->Result->SemanticInterpretation->Properties 与 HONEY 键和 bunny 值。我可以检查

是否属于这种情况

args->Result->SemanticInterpretation->Properties->HasKey("HONEY");

如果是，则使用

检索它的值

args->Result->SemanticInterpretation->Properties->Lookup("HONEY")->GetAt(0); //returns "bunny"

在 GRXML 中创建语音命令同义词

Creating voice command synonyms in GRXML

c++-cx

uwp

grxml