我什么时候应该使用 Google Cloud 语音转文本 api 的增强视频模型?

When should I use the enhanced video model with Google Cloud's speech to text api?

phone 通话的增强模型对我来说意义重大,因为 phone 通话中的音频通常有特定的 quality/sound。但是,我不知道 'video' 增强模型会带来什么,而且似乎没有相关文档。视频中的音质范围可能很大,从原始录音室录制的视频广播到户外刮风时在 iphone 上录制的某人几乎听不见的讲话。视频中的音频压缩也可能无处不在。 'video' 模型实际设计的具体场景是什么?它什么时候会比默认模型或 phone 调用模型更好?

Speech to Text API 提供最适合特定场景的预建模型。其中一个模型是 Video 模型,最适合上述用例:

Use this model for transcribing audio from video clips or other sources (such as podcasts) that have multiple speakers. This model is also often the best choice for audio that was recorded with a high-quality microphone or that has lots of background noise. For best results, provide audio recorded at 16,000Hz or greater sampling rate.

Note: This is a premium model that costs more than the standard rate.

有关要使用的模型的更多详细信息,请参阅 Selecting models