google 转录中的 .flac 文件出现 RecognitionConfig 错误
RecognitionConfig error with .flac files in google transcribe
我正在尝试使用 google 云转录音频文件。这是我的代码:
from google.cloud.speech_v1 import enums
from google.cloud import speech_v1p1beta1
import os
import io
def sample_long_running_recognize(local_file_path):
client = speech_v1p1beta1.SpeechClient()
# local_file_path = 'resources/commercial_mono.wav'
# If enabled, each word in the first alternative of each result will be
# tagged with a speaker tag to identify the speaker.
enable_speaker_diarization = True
# Optional. Specifies the estimated number of speakers in the conversation.
diarization_speaker_count = 2
# The language of the supplied audio
language_code = "en-US"
config = {
"enable_speaker_diarization": enable_speaker_diarization,
"diarization_speaker_count": diarization_speaker_count,
"language_code": language_code,
"encoding": enums.RecognitionConfig.AudioEncoding.FLAC
}
with io.open(local_file_path, "rb") as f:
content = f.read()
audio = {"content": content}
# audio = {"uri": storage_uri}
operation = client.long_running_recognize(config, audio)
print(u"Waiting for operation to complete...")
response = operation.result()
for result in response.results:
# First alternative has words tagged with speakers
alternative = result.alternatives[0]
print(u"Transcript: {}".format(alternative.transcript))
# Print the speaker_tag of each word
for word in alternative.words:
print(u"Word: {}".format(word.word))
print(u"Speaker tag: {}".format(word.speaker_tag))
sample_long_running_recognize('/Users/asi/Downloads/trimmed_3.flac')
我不断收到此错误:
google.api_core.exceptions.InvalidArgument: 400 audio_channel_count `1` in RecognitionConfig must either be unspecified or match the value in the FLAC header `2`.
我不知道我做错了什么。我从 google 云语音 API 文档中复制并粘贴了很多内容。有什么建议吗?
这个属性(audio_channel_count)是输入音频数据的声道数,多声道识别只需要设置这个。我假设这是你的情况,所以正如消息所暗示的那样,你需要在你的配置中设置 'audio_channel_count' : 2
以完全匹配你的音频文件。
有关 RecognitionConfig 对象的属性的详细信息,请查看 source code。
我正在尝试使用 google 云转录音频文件。这是我的代码:
from google.cloud.speech_v1 import enums
from google.cloud import speech_v1p1beta1
import os
import io
def sample_long_running_recognize(local_file_path):
client = speech_v1p1beta1.SpeechClient()
# local_file_path = 'resources/commercial_mono.wav'
# If enabled, each word in the first alternative of each result will be
# tagged with a speaker tag to identify the speaker.
enable_speaker_diarization = True
# Optional. Specifies the estimated number of speakers in the conversation.
diarization_speaker_count = 2
# The language of the supplied audio
language_code = "en-US"
config = {
"enable_speaker_diarization": enable_speaker_diarization,
"diarization_speaker_count": diarization_speaker_count,
"language_code": language_code,
"encoding": enums.RecognitionConfig.AudioEncoding.FLAC
}
with io.open(local_file_path, "rb") as f:
content = f.read()
audio = {"content": content}
# audio = {"uri": storage_uri}
operation = client.long_running_recognize(config, audio)
print(u"Waiting for operation to complete...")
response = operation.result()
for result in response.results:
# First alternative has words tagged with speakers
alternative = result.alternatives[0]
print(u"Transcript: {}".format(alternative.transcript))
# Print the speaker_tag of each word
for word in alternative.words:
print(u"Word: {}".format(word.word))
print(u"Speaker tag: {}".format(word.speaker_tag))
sample_long_running_recognize('/Users/asi/Downloads/trimmed_3.flac')
我不断收到此错误:
google.api_core.exceptions.InvalidArgument: 400 audio_channel_count `1` in RecognitionConfig must either be unspecified or match the value in the FLAC header `2`.
我不知道我做错了什么。我从 google 云语音 API 文档中复制并粘贴了很多内容。有什么建议吗?
这个属性(audio_channel_count)是输入音频数据的声道数,多声道识别只需要设置这个。我假设这是你的情况,所以正如消息所暗示的那样,你需要在你的配置中设置 'audio_channel_count' : 2
以完全匹配你的音频文件。
有关 RecognitionConfig 对象的属性的详细信息,请查看 source code。