Google 云平台:大型媒体文件的语音转文本
Google Cloud Platform: Speech to Text Conversion of Large Media Files
我正在尝试从从 youtube 下载的 mp4 媒体文件中提取文本。由于我正在使用google云平台,所以想尝试一下google云语音。
完成所有安装和配置后,我复制了以下代码片段以开始使用:
with io.open(file_name, 'rb') as audio_file:
content = audio_file.read()
audio = types.RecognitionAudio(content=content)
config = types.RecognitionConfig(encoding=enums.RecognitionConfig.AudioEncoding.LINEAR16, sample_rate_hertz=16000, language_code='en-US')
response = client.long_running_recognize(config, audio)
但是我收到以下关于文件大小的错误:
InvalidArgument: 400 Inline audio exceeds duration limit. Please use a
GCS URI.
然后我读到我应该对大型媒体文件使用流。所以,我尝试了以下代码片段:
with io.open(file_name, 'rb') as audio_file:
content = audio_file.read()
#In practice, stream should be a generator yielding chunks of audio data.
stream = [content]
requests = (types.StreamingRecognizeRequest(audio_content=chunk)for chunk in stream)
config = types.RecognitionConfig(encoding=enums.RecognitionConfig.AudioEncoding.LINEAR16,sample_rate_hertz=16000,language_code='en-US')
streaming_config = types.StreamingRecognitionConfig(config=config)
responses = client.streaming_recognize(streaming_config, requests)
但我仍然收到以下错误:
InvalidArgument: 400 Invalid audio content: too long.
那么,谁能推荐一种转录 mp4 文件并提取文本的方法。我对非常大的媒体文件没有任何复杂的要求。媒体文件最长可达 10-15 分钟。谢谢
错误信息表示文件太大,您需要先将媒体文件复制到Google云存储,然后指定一个云存储URI,例如gs://bucket/path/mediafile。
使用云存储 URI 的关键是:
RecognitionAudio audio =
RecognitionAudio.newBuilder().setUri(gcsUri).build();
以下代码将向您展示如何为输入指定 GCS URI。 Google 在 github 上有一个 complete example。
public static void syncRecognizeGcs(String gcsUri) throws Exception {
// Instantiates a client with GOOGLE_APPLICATION_CREDENTIALS
try (SpeechClient speech = SpeechClient.create()) {
// Builds the request for remote FLAC file
RecognitionConfig config =
RecognitionConfig.newBuilder()
.setEncoding(AudioEncoding.FLAC)
.setLanguageCode("en-US")
.setSampleRateHertz(16000)
.build();
RecognitionAudio audio = RecognitionAudio.newBuilder().setUri(gcsUri).build();
// Use blocking call for getting audio transcript
RecognizeResponse response = speech.recognize(config, audio);
List<SpeechRecognitionResult> results = response.getResultsList();
for (SpeechRecognitionResult result : results) {
// There can be several alternative transcripts for a given chunk of speech. Just use the
// first (most likely) one here.
SpeechRecognitionAlternative alternative = result.getAlternativesList().get(0);
System.out.printf("Transcription: %s%n", alternative.getTranscript());
}
}
}
我正在尝试从从 youtube 下载的 mp4 媒体文件中提取文本。由于我正在使用google云平台,所以想尝试一下google云语音。
完成所有安装和配置后,我复制了以下代码片段以开始使用:
with io.open(file_name, 'rb') as audio_file:
content = audio_file.read()
audio = types.RecognitionAudio(content=content)
config = types.RecognitionConfig(encoding=enums.RecognitionConfig.AudioEncoding.LINEAR16, sample_rate_hertz=16000, language_code='en-US')
response = client.long_running_recognize(config, audio)
但是我收到以下关于文件大小的错误:
InvalidArgument: 400 Inline audio exceeds duration limit. Please use a GCS URI.
然后我读到我应该对大型媒体文件使用流。所以,我尝试了以下代码片段:
with io.open(file_name, 'rb') as audio_file:
content = audio_file.read()
#In practice, stream should be a generator yielding chunks of audio data.
stream = [content]
requests = (types.StreamingRecognizeRequest(audio_content=chunk)for chunk in stream)
config = types.RecognitionConfig(encoding=enums.RecognitionConfig.AudioEncoding.LINEAR16,sample_rate_hertz=16000,language_code='en-US')
streaming_config = types.StreamingRecognitionConfig(config=config)
responses = client.streaming_recognize(streaming_config, requests)
但我仍然收到以下错误:
InvalidArgument: 400 Invalid audio content: too long.
那么,谁能推荐一种转录 mp4 文件并提取文本的方法。我对非常大的媒体文件没有任何复杂的要求。媒体文件最长可达 10-15 分钟。谢谢
错误信息表示文件太大,您需要先将媒体文件复制到Google云存储,然后指定一个云存储URI,例如gs://bucket/path/mediafile。
使用云存储 URI 的关键是:
RecognitionAudio audio = RecognitionAudio.newBuilder().setUri(gcsUri).build();
以下代码将向您展示如何为输入指定 GCS URI。 Google 在 github 上有一个 complete example。
public static void syncRecognizeGcs(String gcsUri) throws Exception {
// Instantiates a client with GOOGLE_APPLICATION_CREDENTIALS
try (SpeechClient speech = SpeechClient.create()) {
// Builds the request for remote FLAC file
RecognitionConfig config =
RecognitionConfig.newBuilder()
.setEncoding(AudioEncoding.FLAC)
.setLanguageCode("en-US")
.setSampleRateHertz(16000)
.build();
RecognitionAudio audio = RecognitionAudio.newBuilder().setUri(gcsUri).build();
// Use blocking call for getting audio transcript
RecognizeResponse response = speech.recognize(config, audio);
List<SpeechRecognitionResult> results = response.getResultsList();
for (SpeechRecognitionResult result : results) {
// There can be several alternative transcripts for a given chunk of speech. Just use the
// first (most likely) one here.
SpeechRecognitionAlternative alternative = result.getAlternativesList().get(0);
System.out.printf("Transcription: %s%n", alternative.getTranscript());
}
}
}