Google Cloud 的速率和音高韵律属性

Question

我是 Google Cloud 的文本转语音的新手。 docs 显示具有 rate 和 pitch 属性的 <prosody> 标签。但这些对我的要求没有影响。例如，如果我使用 rate="slow" 或 rate="fast"，或 pitch="+2st" 或 pitch="-2st"，结果与文档上的示例相同，但速度较慢和较低的音调。

我确保最新版本：

python3 -m pip install --upgrade google-cloud-texttospeech

最小可重现示例：

import os

from google.cloud import texttospeech

AUDIO_CONFIG = texttospeech.AudioConfig(
    audio_encoding=texttospeech.AudioEncoding.LINEAR16)

os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "/path/to/file"

tts_client = texttospeech.TextToSpeechClient()
voice = texttospeech.VoiceSelectionParams(
    language_code="en-US",
    name=  "en-US-Wavenet-A"
)

ssml_input = texttospeech.SynthesisInput(
    ssml='<prosody rate="fast" pitch="+2st">Can you hear me now?</prosody>'
    # or this one:
    #ssml='<prosody rate="slow" pitch="-2st">Can you hear me now?</prosody>'
)

response = tts_client.synthesize_speech(
    input=ssml_input, voice=voice, audio_config=AUDIO_CONFIG
)

with open("/tmp/cloud.wav", 'wb') as out:
    # Write the response to the output file.
    out.write(response.audio_content)

如何使用 Google Cloud 的速率和音调韵律属性？

Answer 1

根据此 document，当您在 Text-to-Speech 代码中编写 SSML 脚本 时，SSML 脚本的格式应如下所示：

<speak>

    <prosody rate="slow" pitch="low">Hi good morning have a nice day</prosody>

</speak>

你可以参考下面提到的一段代码，我最后试了一下，它对我有用。

代码 1 :

我将音调设为 low 并将速率设为 slow .


from google.cloud import texttospeech
client = texttospeech.TextToSpeechClient()

# Sets the text input to be synthesized
synthesis_input = texttospeech.SynthesisInput(

   ssml= '<speak><prosody rate="slow" pitch="low">Hi good morning have a nice day</prosody></speak>'
)

# Builds the voice request, selects the language code ("en-US") and
# the SSML voice gender ("MALE")
voice = texttospeech.VoiceSelectionParams(
   language_code="en-US", ssml_gender=texttospeech.SsmlVoiceGender.MALE
)

# Selects the type of audio file to return
audio_config = texttospeech.AudioConfig(
   audio_encoding=texttospeech.AudioEncoding.MP3
)

# Performs the text-to-speech request on the text input with the selected
# voice parameters and audio file type
response = client.synthesize_speech(
   input=synthesis_input, voice=voice, audio_config=audio_config
)

# Writes the synthetic audio to the output file.
with open("output.mp3", "wb") as out:
   # Write the response to the output file.
   out.write(response.audio_content)
   print('Audio content written to file "output.mp3"')

音频输出： output audio

代码 2 :

我使用的速率为 fast，音高为 +5st。


from google.cloud import texttospeech
client = texttospeech.TextToSpeechClient()

# Sets the text input to be synthesized
synthesis_input = texttospeech.SynthesisInput(

   ssml= '<speak><prosody rate="fast" pitch="+5st">Hi good morning have a nice day</prosody></speak>'
)

# Builds the voice request, selects the language code ("en-US") and
# the SSML voice gender ("MALE")
voice = texttospeech.VoiceSelectionParams(
   language_code="en-US", ssml_gender=texttospeech.SsmlVoiceGender.MALE
)

# Selects the type of audio file to return
audio_config = texttospeech.AudioConfig(
   audio_encoding=texttospeech.AudioEncoding.MP3
)

# Performs the text-to-speech request on the text input with the selected
# voice parameters and audio file type
response = client.synthesize_speech(
   input=synthesis_input, voice=voice, audio_config=audio_config
)

# Writes the synthetic audio to the output file.
with open("output.mp3", "wb") as out:
   # Write the response to the output file.
   out.write(response.audio_content)
   print('Audio content written to file "output.mp3"')

音频输出： output audio

Google Cloud 的速率和音高韵律属性

Google Cloud's rate and pitch prosody attributes

google-text-to-speech

google-cloud-platform