Google Speech-to-Text 中可能的采样率？

Question

我正在使用 GCS 文档中提供的功能，允许我在云存储中转录文本：

def transcribe_gcs(gcs_uri):
    """Asynchronously transcribes the audio file specified by the gcs_uri."""
    from google.cloud import speech
    from google.cloud.speech import enums
    from google.cloud.speech import types
    client = speech.SpeechClient()

    audio = types.RecognitionAudio(uri=gcs_uri)
    config = types.RecognitionConfig(
        encoding=enums.RecognitionConfig.AudioEncoding.FLAC,
        sample_rate_hertz=48000,
        language_code='en-US')

    operation = client.long_running_recognize(config, audio)

    print('Waiting for operation to complete...')
    response = operation.result(timeout=2000)

    # Print the first alternative of all the consecutive results.
    for result in response.results:
        print('Transcript: {}'.format(result.alternatives[0].transcript))
        print('Confidence: {}'.format(result.alternatives[0].confidence))
    return ' '.join(result.alternatives[0].transcript for result in response.results)

默认情况下，sample_rate_hertz 设置为 16000。我将其更改为 48000，但我一直无法将其设置得更高，例如 64k 或 96k。 48k是采样率的上限吗？

Answer 1

如documentation for Cloud Speech API中所述，48000 Hz确实是此API支持的上限。

Sample rates between 8000 Hz and 48000 Hz are supported within the Speech API.

因此，为了使用更高的采样率，您必须对音频文件重新采样。

我也推荐你参考一下 this other page 那里可以找到 Cloud Speech API 支持的功能的基本信息。

Google Speech-to-Text 中可能的采样率？

Possible sample rates in Google Speech-to-Text?

google-cloud-platform

google-cloud-speech