Google 语音 API 流媒体音频超过 1 分钟

Google Speech API streaming audio exceeding 1 minute

我希望能够从 telephone 音频流中提取一个人的话语。 phone 音频被路由到我的服务器，然后创建一个流媒体识别请求。我如何判断一个词何时作为完整话语的一部分存在，或者是当前正在转录的话语的一部分？我应该比较单词之间的时间戳吗？ API 是否会继续 return 中间结果，即使在流式传输 phone 音频中有一段时间没有语音？如何超过 1 分钟的流媒体音频限制？

关于你的前 3 个问题：

您不需要比较单词之间的时间戳，您可以通过查看 Streaming Recognition Result. If the flag is set to true, the response corresponds to a completed transcription, otherwise, it is an interim result. More on this here 中的 is_final flag 来判断一个单词是否是完整话语（最终结果）的一部分。 =13=]

获得最终结果后，在流式传输新话语之前不应生成任何中间结果。

关于你的最后一个问题，你不能超过1分钟的限制，你需要发送multiple requests。

Google 语音 API 流媒体音频超过 1 分钟

Google Speech API streaming audio exceeding 1 minute

audio

speech-to-text

google-cloud-platform

google-speech-api