如何使用 python 从 google 云端读取 mp3 数据
how to read mp3 data from google cloud using python
我正在尝试从 google 云端读取 mp3/wav 数据并尝试实施音频二值化技术。问题是我无法读取变量响应中 google api 传递的结果。
下面是我的python代码
speech_file = r'gs://pp003231/a4a.wav'
config = speech.types.RecognitionConfig(
encoding=speech.enums.RecognitionConfig.AudioEncoding.LINEAR16,
language_code='en-US',
enable_speaker_diarization=True,
diarization_speaker_count=2)
audio = speech.types.RecognitionAudio(uri=speech_file)
response = client.long_running_recognize(config, audio)
print response
result = response.results[-1]
print result
控制台上显示的输出是
追溯(最近一次通话):
文件 "a1.py",第 131 行,位于
打印 response.results
AttributeError: 'Operation' 对象没有属性 'results'
能否就我做错的地方分享您的专家建议?
感谢您的帮助。
您可以访问存储桶中的 wav 文件吗?另外,这是整个代码?似乎缺少 sample_rate_hertz 和导入。这里有 google 文档示例中的代码 copy/pasted,但我将其编辑为仅具有二值化功能。
#!/usr/bin/env python
"""Google Cloud Speech API sample that demonstrates enhanced models
and recognition metadata.
Example usage:
python diarization.py
"""
import argparse
import io
def transcribe_file_with_diarization():
"""Transcribe the given audio file synchronously with diarization."""
# [START speech_transcribe_diarization_beta]
from google.cloud import speech_v1p1beta1 as speech
client = speech.SpeechClient()
audio = speech.types.RecognitionAudio(uri="gs://<YOUR_BUCKET/<YOUR_WAV_FILE>")
config = speech.types.RecognitionConfig(
encoding=speech.enums.RecognitionConfig.AudioEncoding.LINEAR16,
sample_rate_hertz=8000,
language_code='en-US',
enable_speaker_diarization=True,
diarization_speaker_count=2)
print('Waiting for operation to complete...')
response = client.recognize(config, audio)
# The transcript within each result is separate and sequential per result.
# However, the words list within an alternative includes all the words
# from all the results thus far. Thus, to get all the words with speaker
# tags, you only have to take the words list from the last result:
result = response.results[-1]
words_info = result.alternatives[0].words
# Printing out the output:
for word_info in words_info:
print("word: '{}', speaker_tag: {}".format(word_info.word,
word_info.speaker_tag))
# [END speech_transcribe_diarization_beta]
if __name__ == '__main__':
transcribe_file_with_diarization()
到 运行 代码只需将其命名为 diarization.py 并使用命令:
python diarization.py
另外,你必须安装最新的google-云语音库:
pip install --upgrade google-cloud-speech
并且您需要在 json 文件中包含您的服务帐户的凭据,您可以查看更多信息 here
对于该线程的作者来说为时已晚。但是,由于我也有类似的问题,因此将来会为某人发布解决方案。
改变
结果 = response.results[-1]
到
结果 = response.result().results[-1]
它将正常工作
我正在尝试从 google 云端读取 mp3/wav 数据并尝试实施音频二值化技术。问题是我无法读取变量响应中 google api 传递的结果。
下面是我的python代码
speech_file = r'gs://pp003231/a4a.wav'
config = speech.types.RecognitionConfig(
encoding=speech.enums.RecognitionConfig.AudioEncoding.LINEAR16,
language_code='en-US',
enable_speaker_diarization=True,
diarization_speaker_count=2)
audio = speech.types.RecognitionAudio(uri=speech_file)
response = client.long_running_recognize(config, audio)
print response
result = response.results[-1]
print result
控制台上显示的输出是 追溯(最近一次通话): 文件 "a1.py",第 131 行,位于 打印 response.results AttributeError: 'Operation' 对象没有属性 'results'
能否就我做错的地方分享您的专家建议? 感谢您的帮助。
您可以访问存储桶中的 wav 文件吗?另外,这是整个代码?似乎缺少 sample_rate_hertz 和导入。这里有 google 文档示例中的代码 copy/pasted,但我将其编辑为仅具有二值化功能。
#!/usr/bin/env python
"""Google Cloud Speech API sample that demonstrates enhanced models
and recognition metadata.
Example usage:
python diarization.py
"""
import argparse
import io
def transcribe_file_with_diarization():
"""Transcribe the given audio file synchronously with diarization."""
# [START speech_transcribe_diarization_beta]
from google.cloud import speech_v1p1beta1 as speech
client = speech.SpeechClient()
audio = speech.types.RecognitionAudio(uri="gs://<YOUR_BUCKET/<YOUR_WAV_FILE>")
config = speech.types.RecognitionConfig(
encoding=speech.enums.RecognitionConfig.AudioEncoding.LINEAR16,
sample_rate_hertz=8000,
language_code='en-US',
enable_speaker_diarization=True,
diarization_speaker_count=2)
print('Waiting for operation to complete...')
response = client.recognize(config, audio)
# The transcript within each result is separate and sequential per result.
# However, the words list within an alternative includes all the words
# from all the results thus far. Thus, to get all the words with speaker
# tags, you only have to take the words list from the last result:
result = response.results[-1]
words_info = result.alternatives[0].words
# Printing out the output:
for word_info in words_info:
print("word: '{}', speaker_tag: {}".format(word_info.word,
word_info.speaker_tag))
# [END speech_transcribe_diarization_beta]
if __name__ == '__main__':
transcribe_file_with_diarization()
到 运行 代码只需将其命名为 diarization.py 并使用命令:
python diarization.py
另外,你必须安装最新的google-云语音库:
pip install --upgrade google-cloud-speech
并且您需要在 json 文件中包含您的服务帐户的凭据,您可以查看更多信息 here
对于该线程的作者来说为时已晚。但是,由于我也有类似的问题,因此将来会为某人发布解决方案。 改变 结果 = response.results[-1] 到 结果 = response.result().results[-1] 它将正常工作