Google Speech-to-Text 中可能的采样率?

Possible sample rates in Google Speech-to-Text?

我正在使用 GCS 文档中提供的功能,允许我在云存储中转录文本:

def transcribe_gcs(gcs_uri):
    """Asynchronously transcribes the audio file specified by the gcs_uri."""
    from google.cloud import speech
    from google.cloud.speech import enums
    from google.cloud.speech import types
    client = speech.SpeechClient()

    audio = types.RecognitionAudio(uri=gcs_uri)
    config = types.RecognitionConfig(
        encoding=enums.RecognitionConfig.AudioEncoding.FLAC,
        sample_rate_hertz=48000,
        language_code='en-US')

    operation = client.long_running_recognize(config, audio)

    print('Waiting for operation to complete...')
    response = operation.result(timeout=2000)

    # Print the first alternative of all the consecutive results.
    for result in response.results:
        print('Transcript: {}'.format(result.alternatives[0].transcript))
        print('Confidence: {}'.format(result.alternatives[0].confidence))
    return ' '.join(result.alternatives[0].transcript for result in response.results)

默认情况下,sample_rate_hertz 设置为 16000。我将其更改为 48000,但我一直无法将其设置得更高,例如 64k 或 96k。 48k是采样率的上限吗?

documentation for Cloud Speech API中所述,48000 Hz确实是此API支持的上限。

Sample rates between 8000 Hz and 48000 Hz are supported within the Speech API.

因此,为了使用更高的采样率,您必须对音频文件重新采样。

我也推荐你参考一下 this other page 那里可以找到 Cloud Speech API 支持的功能的基本信息。