使用 gcloud 工具时，我可以在 Google Cloud Speech-to-Text api 中指定模型（例如 "video"）吗？

Question

Google 的语音转文本服务有几种可能的模型可用于将语音转录为文本（标准、视频、phone 通话等）。 Google 提供 documentation here 在从 Python 或通过 curl 向其语音转文本 api 发送请求时使用这些模型。但我正在使用 gcloud ml speech recognize 向 API 发出请求，并希望能够指定要使用的模型。我已经阅读了一页又一页的文档来解决这个问题，但还没有成功。

我的命令行脚本：

gcloud ml speech recognize test.wav --language-code=EN --useEnhanced=true

我也试过 --model=video 而不是 --useEnhanced=true。

Google的回复：

ERROR: (gcloud.ml.speech.recognize) unrecognized arguments: --useEnhanced=true

To search the help text of gcloud commands, run:
  gcloud help -- SEARCH_TERMS

请帮忙！谢谢:)

Answer 1

为了指定默认模型示例"video"，您可以将其用作一个组：

gcloud ml video // example

这是带有 gcloud 参考的 link：https://cloud.google.com/sdk/gcloud/reference/ml-engine/#GCLOUD-WIDE-FLAGS

Answer 2

我无法使用 gcloud 工具让它工作，但我可以 "manually" 使用 cURL 来做到这一点。按照此处的文档操作：https://cloud.google.com/speech-to-text/docs/quickstart-protocol。确保创建具有适当角色的服务帐户，下载生成的私钥，然后运行 export GOOGLE_APPLICATION_CREDENTIALS=path-to-credentials.json。然后根据您的要求创建一个 JSON 文件。我的看起来像这样：

{
    "config": {
        "languageCode": "en-US",
        "useEnhanced": true,
        "model": "video"
    },
    "audio": {
        "uri": "gs://bucket/audio.flac"
    }
  }

然后只需执行文档建议的识别端点的 cURL 命令（注意将文件名更改为您创建的 JSON），您就可以开始了。

以下是识别端点的文档：https://cloud.google.com/speech-to-text/docs/reference/rest/v1/speech/recognize。您可以 click-through 访问 RecognitionConfig 和 RecognitionAudio 对象，以查看可以包含在 JSON 文件中的内容。

使用 gcloud 工具时，我可以在 Google Cloud Speech-to-Text api 中指定模型（例如 "video"）吗？

Can I specify the model (e.g. "video") in the Google Cloud Speech-to-Text api when using the gcloud tool?

google-api

speech-to-text

google-cloud-platform

gcloud