Google Cloud 的速率和音高韵律属性

Google Cloud's rate and pitch prosody attributes

我是 Google Cloud 的文本转语音的新手。 docs 显示具有 ratepitch 属性的 <prosody> 标签。但这些对我的要求没有影响。例如,如果我使用 rate="slow"rate="fast",或 pitch="+2st"pitch="-2st",结果与文档上的示例相同,但速度较慢和较低的音调。

我确保最新版本:

python3 -m pip install --upgrade google-cloud-texttospeech

最小可重现示例:

import os

from google.cloud import texttospeech

AUDIO_CONFIG = texttospeech.AudioConfig(
    audio_encoding=texttospeech.AudioEncoding.LINEAR16)

os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "/path/to/file"

tts_client = texttospeech.TextToSpeechClient()
voice = texttospeech.VoiceSelectionParams(
    language_code="en-US",
    name=  "en-US-Wavenet-A"
)

ssml_input = texttospeech.SynthesisInput(
    ssml='<prosody rate="fast" pitch="+2st">Can you hear me now?</prosody>'
    # or this one:
    #ssml='<prosody rate="slow" pitch="-2st">Can you hear me now?</prosody>'
)

response = tts_client.synthesize_speech(
    input=ssml_input, voice=voice, audio_config=AUDIO_CONFIG
)

with open("/tmp/cloud.wav", 'wb') as out:
    # Write the response to the output file.
    out.write(response.audio_content)

如何使用 Google Cloud 的速率和音调韵律属性?

根据此 document,当您在 Text-to-Speech 代码中编写 SSML 脚本 时,SSML 脚本的格式应如下所示:

<speak>

    <prosody rate="slow" pitch="low">Hi good morning have a nice day</prosody>

</speak>

你可以参考下面提到的一段代码,我最后试了一下,它对我有用。

代码 1 :

我将音调设为 low 并将速率设为 slow .


from google.cloud import texttospeech
client = texttospeech.TextToSpeechClient()

# Sets the text input to be synthesized
synthesis_input = texttospeech.SynthesisInput(

   ssml= '<speak><prosody rate="slow" pitch="low">Hi good morning have a nice day</prosody></speak>'
)

# Builds the voice request, selects the language code ("en-US") and
# the SSML voice gender ("MALE")
voice = texttospeech.VoiceSelectionParams(
   language_code="en-US", ssml_gender=texttospeech.SsmlVoiceGender.MALE
)

# Selects the type of audio file to return
audio_config = texttospeech.AudioConfig(
   audio_encoding=texttospeech.AudioEncoding.MP3
)

# Performs the text-to-speech request on the text input with the selected
# voice parameters and audio file type
response = client.synthesize_speech(
   input=synthesis_input, voice=voice, audio_config=audio_config
)

# Writes the synthetic audio to the output file.
with open("output.mp3", "wb") as out:
   # Write the response to the output file.
   out.write(response.audio_content)
   print('Audio content written to file "output.mp3"')

音频输出: output audio

代码 2 :

我使用的速率为 fast,音高为 +5st


from google.cloud import texttospeech
client = texttospeech.TextToSpeechClient()

# Sets the text input to be synthesized
synthesis_input = texttospeech.SynthesisInput(

   ssml= '<speak><prosody rate="fast" pitch="+5st">Hi good morning have a nice day</prosody></speak>'
)

# Builds the voice request, selects the language code ("en-US") and
# the SSML voice gender ("MALE")
voice = texttospeech.VoiceSelectionParams(
   language_code="en-US", ssml_gender=texttospeech.SsmlVoiceGender.MALE
)

# Selects the type of audio file to return
audio_config = texttospeech.AudioConfig(
   audio_encoding=texttospeech.AudioEncoding.MP3
)

# Performs the text-to-speech request on the text input with the selected
# voice parameters and audio file type
response = client.synthesize_speech(
   input=synthesis_input, voice=voice, audio_config=audio_config
)

# Writes the synthetic audio to the output file.
with open("output.mp3", "wb") as out:
   # Write the response to the output file.
   out.write(response.audio_content)
   print('Audio content written to file "output.mp3"')

音频输出: output audio