Error RecognitionConfig 必须未指定或匹配 google 语音到文本 api 中的 FLAC header 音频中的值

Error RecognitionConfig must either be unspecified or match the value in the FLAC header audio in google speech to text api

我已经尝试将音频从立体声转换为单声道,但没有成功, 我尝试更改以赫兹为单位的速率,但也没有成功

from pydub import AudioSegment

from google.cloud import speech_v1p1beta1 as speech
import os, logging 
import urllib.request

KEY_API_ROOT = 'path'
os.environ["GOOGLE_APPLICATION_CREDENTIALS"]=KEY_API_ROOT+"xx.json"
client = speech.SpeechClient()

url = incoming_message['entry'][0]['messaging'][0]['message']['attachments'][0]['payload']['url']

if '.aac' in url:
    formato = 'aac'
else:
    formato = 'mp4'

# download audio
urllib.request.urlretrieve(url, VOICE_ROOT + fbid + "." + formato)

# path
diretorio_audio = VOICE_ROOT + fbid + "." + formato

mp4_version = AudioSegment.from_file(diretorio_audio, formato)

mp4_version.export(VOICE_ROOT + fbid + ".flac", format="flac", bitrate="400k", parameters=["-ac", "1"])

with open(VOICE_ROOT + fbid + '.flac', 'rb') as audio_file:
    content = audio_file.read()

audio = speech.types.RecognitionAudio(content=content)

config = speech.types.RecognitionConfig(
    encoding=speech.enums.RecognitionConfig.AudioEncoding.FLAC,
    sample_rate_hertz=44100,
    language_code='en-US',
    enable_word_confidence=True)
try:
    response = client.recognize(config, audio)
except Exception as erro_stt:
    logging.info("Erro 66 ProcessarAudio no STT: {}".format(erro_stt))

错误:

400 sample_rate_hertz (44100) in RecognitionConfig must either be unspecified or match the value in the FLAC header (48000).

我解决了这个问题:

安装包

sudo apt-get 安装 sox

sudo apt-get 安装 libsox-fmt-mp3

执行

sox input.mp3 --rate 16k --bits 16 --channels 1 output.flac