Movie py:从内存中的文本到语音导入音频

Movie py : importing audio from text-to-speech in memory

我正在尝试将 Azure 的文本转语音与 movie.py 结合使用来为视频创建音频流。

result = synthesizer.speak_ssml_async(xml_string).get()
stream = AudioDataStream(result)

这个过程的输出是:

<azure.cognitiveservices.speech.AudioDataStream at 0x2320cb87ac0>

但是,movie.py 无法使用以下命令导入它:

audioClip = AudioFileClip(stream)

这是给我的错误:

AudioDataStream' object has no attribute 'endswith'

我需要将 Azure Stream 转换为 .wav 吗?我怎么做?我需要在不在本地写入 .wav 文件(例如 stream.save_to_wav_file)的情况下完成整个过程,而只是使用内存流。

请问有人能看到灯吗?

我为你写了一个 HTTP 触发器 Python 函数,试试下面的代码:

import azure.functions as func
import azure.cognitiveservices.speech as speechsdk
import tempfile
import imageio
imageio.plugins.ffmpeg.download()
from moviepy.editor import AudioFileClip



speech_key="<speech service key>"
service_region="<speech service region>"
temp_file_path = tempfile.gettempdir() + "/result.wav"
text = 'hello, this is a test'

def main(req: func.HttpRequest) -> func.HttpResponse:
    speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)

    auto_detect_source_language_config = speechsdk.languageconfig.AutoDetectSourceLanguageConfig()

    speech_synthesizer = speechsdk.SpeechSynthesizer(
        speech_config=speech_config, auto_detect_source_language_config=auto_detect_source_language_config,audio_config=None)

    result = speech_synthesizer.speak_text_async(text).get();
    if result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:
            stream = speechsdk.AudioDataStream(result)
            stream.save_to_wav_file(temp_file_path)
    
    myclip = AudioFileClip(temp_file_path)

    return func.HttpResponse(str(myclip.duration))

从语音服务获取语音流并保存到临时文件并使用 AudioDataStream 获取其持续时间的逻辑很简单。

结果:

如果您仍然遇到一些错误,您可以在此处获取错误详细信息:

如果您还有其他问题,请告诉我。