语音服务声称文本到语音配额说 200/秒 我不能超过 20/分钟 - 慢 600 倍

Speech service claimed Text-to-speech quotas says 200/sec I can't get higher than 20/min - 600 times slower

使用 Microsoft Speech SDK (Microsoft.CognitiveServices.Speech) 1.20.0 我正在尝试实施 Azure 认知服务文本转语音作为 Microsoft 语音平台的升级。

文档 (https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/speech-services-quotas-and-limits#text-to-speech-quotas-and-limits-per-resource) 说每秒 200 个事务,但每次我测试它都会以大约 20/分钟的速度开始拒绝(慢 600 倍)。这是在 F0 免费套餐上,尽管标准套餐也从 200/s 开始。

通过创建新的 SpeechSynthesizer 然后循环调用 SpeakSsmlAsync 进行基本测试,它可靠地以 20/分钟的速度失败。等待每个呼叫,因此应该只有一个并发连接。在一分钟内的第 21 次调用中,结果为 Canceled,错误为 BadRequest“Connection was closed by the remote host。错误代码:1007。错误详细信息:由于请求过多而受到限制 USP 状态:3。接收到的音频大小:0 字节."

根据我的研究,Microsoft 有一份关于如何 Lower speech synthesis latency using Speech SDK 的最新官方文档。

Normally, we measure the latency by first byte latency and finish latency.

The first byte latency is much lower than finish latency in most cases. The first byte latency is independent from text length, while finish latency increases with text length.

Ideally, we want to minimize the user-experienced latency (the latency before user hears the sound) to one network route trip time plus the first audio chunk latency of the speech synthesis service.

该解决方案支持 C#、C++、Java、Python 和 Objective-C。我相信通过实施给定的建议,您可能会得到更好的结果。

我在文档 here 上发布了一个查询,现在已更新以阐明免费 (F0) 服务限制为每 60 秒 20 次