google 语音 API 可以将文本转换为语音吗?
Can google speech API convert text to speech?
我使用 Google 语音 API 我使用以下代码成功地将语音转换为文本。
import speech_recognition as sr
import os
#obtain audio from the microphone
r = sr.Recognizer()
with sr.Microphone() as source:
print("Say something!")
audio = r.listen(source)
# recognize speech using Google Cloud Speech
GOOGLE_CLOUD_SPEECH_CREDENTIALS = r"""{KEY}
"""
# INSERT THE CONTENTS OF THE GOOGLE CLOUD SPEECH JSON CREDENTIALS FILE HERE
try:
speechOutput = (r.recognize_google_cloud(audio, credentials_json=GOOGLE_CLOUD_SPEECH_CREDENTIALS, language="si-LK"))
except sr.UnknownValueError:
speechOutput = ("Google Cloud Speech could not understand audio")
except sr.RequestError as e:
speechOutput = ("Could not request results from Google Cloud Speech service; {0}".format(e))
print(speechOutput)
我想知道是否可以使用相同的 API 将文本转换为语音?如果不是,要使用什么 API 和示例 python 代码。
谢谢!
为此,您需要使用新的 Text-to-Speech API which is in Beta as of now. You can find a Python quickstart in the Client Library section of the docs. The sample is part of the python-docs-sample repo。在此处添加示例的相关部分以获得更好的可见性:
def synthesize_text(text):
"""Synthesizes speech from the input string of text."""
from google.cloud import texttospeech
client = texttospeech.TextToSpeechClient()
input_text = texttospeech.types.SynthesisInput(text=text)
# Note: the voice can also be specified by name.
# Names of voices can be retrieved with client.list_voices().
voice = texttospeech.types.VoiceSelectionParams(
language_code='en-US',
ssml_gender=texttospeech.enums.SsmlVoiceGender.FEMALE)
audio_config = texttospeech.types.AudioConfig(
audio_encoding=texttospeech.enums.AudioEncoding.MP3)
response = client.synthesize_speech(input_text, voice, audio_config)
# The response's audio_content is binary.
with open('output.mp3', 'wb') as out:
out.write(response.audio_content)
print('Audio content written to file "output.mp3"')
更新:速率和音调配置
您可以将文本元素包含在 <prosody>
标记中以修改 rate
和 pitch
。例如:
<prosody rate="slow" pitch="-2st">Can you hear me now?</prosody>
那些遵循 W3 规范的可能值可以在 here. The SSML docs 文本到语音 API 中找到详细说明,他们还提供了一些示例。
此外,您可以使用 <audio>
中的 speed
选项控制一般音频播放速率,目前接受 50% 到 200% 的值(以 1% 为增量)。
我使用 Google 语音 API 我使用以下代码成功地将语音转换为文本。
import speech_recognition as sr
import os
#obtain audio from the microphone
r = sr.Recognizer()
with sr.Microphone() as source:
print("Say something!")
audio = r.listen(source)
# recognize speech using Google Cloud Speech
GOOGLE_CLOUD_SPEECH_CREDENTIALS = r"""{KEY}
"""
# INSERT THE CONTENTS OF THE GOOGLE CLOUD SPEECH JSON CREDENTIALS FILE HERE
try:
speechOutput = (r.recognize_google_cloud(audio, credentials_json=GOOGLE_CLOUD_SPEECH_CREDENTIALS, language="si-LK"))
except sr.UnknownValueError:
speechOutput = ("Google Cloud Speech could not understand audio")
except sr.RequestError as e:
speechOutput = ("Could not request results from Google Cloud Speech service; {0}".format(e))
print(speechOutput)
我想知道是否可以使用相同的 API 将文本转换为语音?如果不是,要使用什么 API 和示例 python 代码。 谢谢!
为此,您需要使用新的 Text-to-Speech API which is in Beta as of now. You can find a Python quickstart in the Client Library section of the docs. The sample is part of the python-docs-sample repo。在此处添加示例的相关部分以获得更好的可见性:
def synthesize_text(text):
"""Synthesizes speech from the input string of text."""
from google.cloud import texttospeech
client = texttospeech.TextToSpeechClient()
input_text = texttospeech.types.SynthesisInput(text=text)
# Note: the voice can also be specified by name.
# Names of voices can be retrieved with client.list_voices().
voice = texttospeech.types.VoiceSelectionParams(
language_code='en-US',
ssml_gender=texttospeech.enums.SsmlVoiceGender.FEMALE)
audio_config = texttospeech.types.AudioConfig(
audio_encoding=texttospeech.enums.AudioEncoding.MP3)
response = client.synthesize_speech(input_text, voice, audio_config)
# The response's audio_content is binary.
with open('output.mp3', 'wb') as out:
out.write(response.audio_content)
print('Audio content written to file "output.mp3"')
更新:速率和音调配置
您可以将文本元素包含在 <prosody>
标记中以修改 rate
和 pitch
。例如:
<prosody rate="slow" pitch="-2st">Can you hear me now?</prosody>
那些遵循 W3 规范的可能值可以在 here. The SSML docs 文本到语音 API 中找到详细说明,他们还提供了一些示例。
此外,您可以使用 <audio>
中的 speed
选项控制一般音频播放速率,目前接受 50% 到 200% 的值(以 1% 为增量)。