在同一个 SSML 中混合语言

Question

如果我将这一小段 SSML 发送到语音处理器，我会得到两个声音

<speak version='1.0' xml:lang='es-ES'>
  <voice xml:lang='es-ES' xml:gender='Male' name='Microsoft Server Speech Text to Speech Voice (es-ES, Pablo, Apollo)'>
    <p>
        <s>Hola </s>
        <s xml:lang='en'>Hello</s>
        <s>¿Cómo estas?.</s>
    </p>
  </voice>
</speak>

一个男人说西班牙语，一个女人说英语。这是 Project Oxford Text to Speech 引擎的限制吗？换句话说，我希望同一个声音会说多种语言，但事实并非如此。

Answer 1

引用 SSML spec,

Specifying xml:lang does not imply a change in voice, though this may indeed occur. When a given voice is unable to speak content in the indicated language, a new voice may be selected by the processor.

虽然当前的后备行为有一些不足之处，但建议创建多个语音节点并在切换语言时更明确地选择语音。

在同一个 SSML 中混合语言

Mixing languages in the same SSML

text-to-speech

ssml

microsoft-cognitive