Azure 文本到语音将 SpeakTextAsync 转换为有效的 NAudio 波流

Question

我正在尝试使用 Azure 文本到语音服务 (Microsoft.CognitiveServices.Speech) 将文本转换为音频，然后使用 NAudio 将音频转换为另一种格式。

我已经使用 mp3 文件获得了 NAudio 部分。但是我无法从 SpeakTextAsync 获得任何适用于 NAudio 的输出。

这是我尝试使用 NAudio 播放文件的代码（作为临时测试），但这没有播放任何有效内容。

var waveStream = new RawSourceWaveStream(azureStream, new WaveFormat());
using (var waveOut = new WaveOutEvent())
{
    waveOut.Init(waveStream);
    Log.Logger.Debug("Playing sounds...");
    waveOut.Play();
    while (waveOut.PlaybackState == PlaybackState.Playing)
    {
        Thread.Sleep(1000);
    }
}

我找到的 2 个可能的输出是，但我可能遗漏了一些重要的东西：

选项 1 (AudioDataStream):

using var synthesizer = new SpeechSynthesizer(_config, null);
using var result = await synthesizer.SpeakTextAsync(text);
switch (result.Reason)
{
    case ResultReason.SynthesizingAudioCompleted:
        Console.WriteLine($"Speech synthesized to speaker for text [{text}]");
        return AudioDataStream.FromResult(result);
    case ResultReason.Canceled:
    {
         var cancellation = SpeechSynthesisCancellationDetails.FromResult(result);
         Console.WriteLine($"CANCELED: Reason={cancellation.Reason}");

         if (cancellation.Reason == CancellationReason.Error)
         {
             Console.WriteLine($"CANCELED: ErrorCode={cancellation.ErrorCode}");
             Console.WriteLine($"CANCELED: ErrorDetails=[{cancellation.ErrorDetails}]");
                        Console.WriteLine($"CANCELED: Did you update the subscription info?");
         }
         return null;
     }
     default:
         return null;
 }

选项 2 (PullAudioOutputStream):

PullAudioOutputStream stream = new PullAudioOutputStream();
AudioConfig config = AudioConfig.FromStreamOutput(stream);

using var synthesizer = new SpeechSynthesizer(_config, null);
using var result = await synthesizer.SpeakTextAsync(text);
switch (result.Reason)
{
    case ResultReason.SynthesizingAudioCompleted:
        Console.WriteLine($"Speech synthesized to speaker for text [{text}]");
        return stream;
    case ResultReason.Canceled:
    {
         var cancellation = SpeechSynthesisCancellationDetails.FromResult(result);
         Console.WriteLine($"CANCELED: Reason={cancellation.Reason}");

         if (cancellation.Reason == CancellationReason.Error)
         {
             Console.WriteLine($"CANCELED: ErrorCode={cancellation.ErrorCode}");
             Console.WriteLine($"CANCELED: ErrorDetails=[{cancellation.ErrorDetails}]");
                        Console.WriteLine($"CANCELED: Did you update the subscription info?");
         }
         return null;
     }
     default:
         return null;
 }

那么如何将文本转换为有效的 NAudio 格式的语音？

Answer 1

凯文，

为什么需要 NAudio？如果仅用于播放，则没有必要，以下行大声播放文本：

await synthesizer.SpeakTextAsync(text);

出于任何其他原因，如果您需要使用NAudio进行语音合成的结果。

if (result.Reason == ResultReason.SynthesizingAudioCompleted)
{
    using var stream = new MemoryStream(result.AudioData);
    using var reader = new WaveFileReader(stream);
    using var player = new WaveOutEvent();

    player.Init(reader);
    player.Play();

    while (player.PlaybackState == PlaybackState.Playing)
    {

        Thread.Sleep(500);
    }
}

Azure 文本到语音将 SpeakTextAsync 转换为有效的 NAudio 波流

Azure text to speech convert SpeakTextAsync to valid NAudio wavestream

.net

synthesizer

azure

.net-core-3.0

.net-core-3.1