转录长音频文件不起作用

Question

我正在尝试使用 google page 中的示例代码来转录一个 30 分钟的 .wav 文件。我稍微更改了原始代码，如下所示：

from google.cloud import speech
from google.cloud.speech import enums
from google.cloud.speech import types

os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = 'C:\Users\louie\Desktop\PSC.json'
gcs_uri = os.path.join('C:\Users\louie\Desktop','Untitled1.wav')

client = speech.SpeechClient()

audio = types.RecognitionAudio(uri=gcs_uri)
config = types.RecognitionConfig(
    encoding=enums.RecognitionConfig.AudioEncoding.LINEAR16,
    sample_rate_hertz=44100,
    language_code='en-US')

operation = client.long_running_recognize(config, audio)

print('Waiting for operation to complete...')
response = operation.result(timeout=90)

# Each result is for a consecutive portion of the audio. Iterate through
# them to get the transcripts for the entire audio file.
for result in response.results:
    # The first alternative is the most likely one for this portion.
    print(u'Transcript: {}'.format(result.alternatives[0].transcript))
    print('Confidence: {}'.format(result.alternatives[0].confidence))

当我运行它时，我得到了错误 400 Request contains an invalid argument 我很确定我的预设是正确的，因为短转录代码对我有用。有人可以帮我解决这个问题吗？谢谢！

编辑：我认为这个问题与 gcs_uri 的错误格式有关。有没有一种方法可以转录大型音频文件而无需将其上传到 Google 云存储？

Answer 1

我注意到 gcs_uri 实际上应该指的是 Google 云中的目录。格式应该像 gs://<bucket_name>/<file_path_inside_bucket>

Answer 2

有更多长文件友好的 ASR API

转录长音频文件不起作用

Transcribing Long Audio File doesn't work

python

speech-recognition

google-api

speech-to-text