Google Cloud Speech to Text 中的 enable_speaker_diarization 标记出错
Error with enable_speaker_diarization tag in Google Cloud Speech to Text
使用 Google-Speech-to-Text,我可以使用默认参数转录音频剪辑。但是,我在使用 enable_speaker_diarization 标签来分析音频剪辑中的各个扬声器时收到一条错误消息。 Google 记录它 here
这是一个很长的识别音频剪辑,因此我使用 Google 推荐 here
的异步请求
我的代码-
def transcribe_gcs(gcs_uri):
from google.cloud import speech
from google.cloud import speech_v1 as speech
from google.cloud.speech import enums
from google.cloud.speech import types
client = speech.SpeechClient()
audio = types.RecognitionAudio(uri = gcs_uri)
config = speech.types.RecognitionConfig(encoding=speech.enums.RecognitionConfig.AudioEncoding.FLAC,
sample_rate_hertz= 16000,
language_code = 'en-US',
enable_speaker_diarization=True,
diarization_speaker_count=2)
operation = client.long_running_recognize(config, audio)
print('Waiting for operation to complete...')
response = operation.result(timeout=3000)
result = response.results[-1]
words_info = result.alternatives[0].words
for word_info in words_info:
print("word: '{}', speaker_tag: {}".format(word_info.word, word_info.speaker_tag))
使用后-
transcribe_gcs('gs://bucket_name/filename.flac')
我收到错误
ValueError: Protocol message RecognitionConfig has no "enable_speaker_diarization" field.
我确定这与库有关,我已经使用了我能找到的所有变体,例如
from google.cloud import speech_v1p1beta1 as speech
from google.cloud import speech
但我一直收到同样的错误。
注意 - 我已经在 运行 此代码之前使用 JSON 文件进行了身份验证。
speech.types.RecognitionConfig
中的 enable_speaker_diarization=True
参数目前仅在库 speech_v1p1beta1
中可用,因此,您需要导入该库才能使用该参数,而不是默认语音一。我对您的代码做了一些修改,对我来说效果很好。请注意,您需要使用服务帐户才能 运行 此代码。
def transcribe_gcs(gcs_uri):
from google.cloud import speech_v1p1beta1 as speech
from google.cloud.speech_v1p1beta1 import enums
from google.cloud.speech_v1p1beta1 import types
client = speech.SpeechClient()
audio = types.RecognitionAudio(uri = gcs_uri)
config = speech.types.RecognitionConfig( language_code = 'en-US',enable_speaker_diarization=True, diarization_speaker_count=2)
operation = client.long_running_recognize(config, audio)
print('Waiting for operation to complete...')
response = operation.result(timeout=3000)
result = response.results[-1]
words_info = result.alternatives[0].words
tag=1
speaker=""
for word_info in words_info:
if word_info.speaker_tag==tag:
speaker=speaker+" "+word_info.word
else:
print("speaker {}: {}".format(tag,speaker))
tag=word_info.speaker_tag
speaker=""+word_info.word
print("speaker {}: {}".format(tag,speaker))
结果应该是这样的:
错误原因与Node JS用户类似。通过此调用导入测试版功能,然后使用说话人识别功能。
const speech = require('@google-cloud/speech').v1p1beta1;
错误是因为您没有导入一些文件。为此,请导入以下文件。
from google.cloud import speech_v1p1beta1 as speech
from google.cloud.speech_v1p1beta1 import enums
from google.cloud.speech_v1p1beta1 import types
使用 Google-Speech-to-Text,我可以使用默认参数转录音频剪辑。但是,我在使用 enable_speaker_diarization 标签来分析音频剪辑中的各个扬声器时收到一条错误消息。 Google 记录它 here 这是一个很长的识别音频剪辑,因此我使用 Google 推荐 here
的异步请求我的代码-
def transcribe_gcs(gcs_uri):
from google.cloud import speech
from google.cloud import speech_v1 as speech
from google.cloud.speech import enums
from google.cloud.speech import types
client = speech.SpeechClient()
audio = types.RecognitionAudio(uri = gcs_uri)
config = speech.types.RecognitionConfig(encoding=speech.enums.RecognitionConfig.AudioEncoding.FLAC,
sample_rate_hertz= 16000,
language_code = 'en-US',
enable_speaker_diarization=True,
diarization_speaker_count=2)
operation = client.long_running_recognize(config, audio)
print('Waiting for operation to complete...')
response = operation.result(timeout=3000)
result = response.results[-1]
words_info = result.alternatives[0].words
for word_info in words_info:
print("word: '{}', speaker_tag: {}".format(word_info.word, word_info.speaker_tag))
使用后-
transcribe_gcs('gs://bucket_name/filename.flac')
我收到错误
ValueError: Protocol message RecognitionConfig has no "enable_speaker_diarization" field.
我确定这与库有关,我已经使用了我能找到的所有变体,例如
from google.cloud import speech_v1p1beta1 as speech
from google.cloud import speech
但我一直收到同样的错误。 注意 - 我已经在 运行 此代码之前使用 JSON 文件进行了身份验证。
speech.types.RecognitionConfig
中的 enable_speaker_diarization=True
参数目前仅在库 speech_v1p1beta1
中可用,因此,您需要导入该库才能使用该参数,而不是默认语音一。我对您的代码做了一些修改,对我来说效果很好。请注意,您需要使用服务帐户才能 运行 此代码。
def transcribe_gcs(gcs_uri):
from google.cloud import speech_v1p1beta1 as speech
from google.cloud.speech_v1p1beta1 import enums
from google.cloud.speech_v1p1beta1 import types
client = speech.SpeechClient()
audio = types.RecognitionAudio(uri = gcs_uri)
config = speech.types.RecognitionConfig( language_code = 'en-US',enable_speaker_diarization=True, diarization_speaker_count=2)
operation = client.long_running_recognize(config, audio)
print('Waiting for operation to complete...')
response = operation.result(timeout=3000)
result = response.results[-1]
words_info = result.alternatives[0].words
tag=1
speaker=""
for word_info in words_info:
if word_info.speaker_tag==tag:
speaker=speaker+" "+word_info.word
else:
print("speaker {}: {}".format(tag,speaker))
tag=word_info.speaker_tag
speaker=""+word_info.word
print("speaker {}: {}".format(tag,speaker))
结果应该是这样的:
错误原因与Node JS用户类似。通过此调用导入测试版功能,然后使用说话人识别功能。
const speech = require('@google-cloud/speech').v1p1beta1;
错误是因为您没有导入一些文件。为此,请导入以下文件。
from google.cloud import speech_v1p1beta1 as speech
from google.cloud.speech_v1p1beta1 import enums
from google.cloud.speech_v1p1beta1 import types