400 指定MP3编码匹配音频文件
400 Specify MP3 encoding to match audio file
我正在尝试使用 google-speech2text api 但是,即使我已将我的代码设置为通过所有可用的编码器,我仍然收到 "Specify MP3 encoding to match audio file"。
This 是我尝试使用的文件
我必须补充一点,如果我在 their UI 上上传文件,我可以获得输出。所以我假设源文件没有任何问题。
from google.cloud import speech
from google.cloud.speech import enums
from google.cloud.speech import types
client = speech.SpeechClient.from_service_account_json('gcp_credentials.json')
speech_file = 'chunk7.mp3'
import io
from google.cloud import speech
from google.cloud.speech import enums
from google.cloud.speech import types
with io.open(speech_file, 'rb') as audio_file:
content = audio_file.read()
audio = types.RecognitionAudio(content=content)
import wave
ENCODING = [enums.RecognitionConfig.AudioEncoding.LINEAR16,
enums.RecognitionConfig.AudioEncoding.FLAC,
enums.RecognitionConfig.AudioEncoding.MULAW,
enums.RecognitionConfig.AudioEncoding.AMR,
enums.RecognitionConfig.AudioEncoding.AMR_WB,
enums.RecognitionConfig.AudioEncoding.OGG_OPUS,
enums.RecognitionConfig.AudioEncoding.SPEEX_WITH_HEADER_BYTE]
SAMPLE_RATE_HERTZ = [8000, 12000, 16000, 24000, 48000]
for enco in ENCODING:
for rate in SAMPLE_RATE_HERTZ:
config = types.RecognitionConfig(
encoding=enco,
sample_rate_hertz=rate,
language_code='en-US')
# Detects speech in the audio file
response = []
print(response)
try:
response = client.recognize(config, audio)
print(response)
except:
pass
print("-----------------------------------------------------")
print(str(rate) + " " + str(enco))
print("response: ", str(response))
或者,还有另一个波斯语文件 here ('fa-IR') - 我遇到了类似的问题。我最初把奥巴马的文件放在那里,因为它更容易理解。如果也用第二个文件测试你的答案,我将不胜感激。
您的音频格式似乎不受支持,
只需转换为其他格式(建议使用 flac)即可轻松实现,您有两个选择:
- 在 google 中搜索在线音频转换
在你的机器上自己转换:
1) 安装sox(编辑)
2) 安装编码器需要它:
* [lame](http://lame.sourceforge.net) mp3 encoder
* [flac](https://xiph.org/flac/download.html) flac encoder
3) 运行 命令:
sox source.mp3 --channels=1 --bits=16 dest.flac
在这种情况下你也可以使用python来执行命令:
import subprocess
subprocess.check_output(['sox',sourcePath,'--channels=1','--bits=16',destPath])
请注意,您不需要指定 sample_rate_hertz 和编码,因为所有信息都在 flac headers 本身中,因此您可以省略它们:
config = types.RecognitionConfig(language_code="fa-IR")
esponse = client.recognize(config, audio)
您似乎将 encoding
设置为等于 API 提供的所有可能属性。我发现:
encoding = enums.RecognitionConfig.AudioEncoding.ENCODING_UNSPECIFIED
适用于 mp3 文件。所以试试这个:
from google.cloud import speech_v1
from google.cloud.speech_v1 import enums
import io
speech_file = 'chunk7.mp3'
def sample_recognize(local_file_path):
"""
Transcribe a short audio file using synchronous speech recognition
Args:
local_file_path Path to local audio file, e.g. /path/audio.wav
"""
client = speech_v1.SpeechClient()
# local_file_path = 'resources/brooklyn_bridge.raw'
# The language of the supplied audio
language_code = "en-US"
# Sample rate in Hertz of the audio data sent
sample_rate_hertz = 16000
# If this fails try sample_rate_hertz = [8000, 12000, 16000, 24000, 48000]
# Encoding of audio data sent. This sample sets this explicitly.
# This field is optional for FLAC and WAV audio formats.
encoding = enums.RecognitionConfig.AudioEncoding.ENCODING_UNSPECIFIED
config = {
"language_code": language_code,
"sample_rate_hertz": sample_rate_hertz,
"encoding": encoding,
}
with io.open(local_file_path, "rb") as f:
content = f.read()
audio = {"content": content}
response = client.recognize(config, audio)
for result in response.results:
# First alternative is the most probable result
alternative = result.alternatives[0]
print(u"Transcript: {}".format(alternative.transcript))
sample_recognize(speech_file)
以上代码是 speech-to-text docs. If that doesn't work try looking deeper into encoding docs and best practices 中的示例,稍作修改。祝你好运。
我正在尝试使用 google-speech2text api 但是,即使我已将我的代码设置为通过所有可用的编码器,我仍然收到 "Specify MP3 encoding to match audio file"。
This 是我尝试使用的文件
我必须补充一点,如果我在 their UI 上上传文件,我可以获得输出。所以我假设源文件没有任何问题。
from google.cloud import speech
from google.cloud.speech import enums
from google.cloud.speech import types
client = speech.SpeechClient.from_service_account_json('gcp_credentials.json')
speech_file = 'chunk7.mp3'
import io
from google.cloud import speech
from google.cloud.speech import enums
from google.cloud.speech import types
with io.open(speech_file, 'rb') as audio_file:
content = audio_file.read()
audio = types.RecognitionAudio(content=content)
import wave
ENCODING = [enums.RecognitionConfig.AudioEncoding.LINEAR16,
enums.RecognitionConfig.AudioEncoding.FLAC,
enums.RecognitionConfig.AudioEncoding.MULAW,
enums.RecognitionConfig.AudioEncoding.AMR,
enums.RecognitionConfig.AudioEncoding.AMR_WB,
enums.RecognitionConfig.AudioEncoding.OGG_OPUS,
enums.RecognitionConfig.AudioEncoding.SPEEX_WITH_HEADER_BYTE]
SAMPLE_RATE_HERTZ = [8000, 12000, 16000, 24000, 48000]
for enco in ENCODING:
for rate in SAMPLE_RATE_HERTZ:
config = types.RecognitionConfig(
encoding=enco,
sample_rate_hertz=rate,
language_code='en-US')
# Detects speech in the audio file
response = []
print(response)
try:
response = client.recognize(config, audio)
print(response)
except:
pass
print("-----------------------------------------------------")
print(str(rate) + " " + str(enco))
print("response: ", str(response))
或者,还有另一个波斯语文件 here ('fa-IR') - 我遇到了类似的问题。我最初把奥巴马的文件放在那里,因为它更容易理解。如果也用第二个文件测试你的答案,我将不胜感激。
您的音频格式似乎不受支持, 只需转换为其他格式(建议使用 flac)即可轻松实现,您有两个选择:
- 在 google 中搜索在线音频转换
在你的机器上自己转换:
1) 安装sox(编辑)
2) 安装编码器需要它:
* [lame](http://lame.sourceforge.net) mp3 encoder * [flac](https://xiph.org/flac/download.html) flac encoder
3) 运行 命令:
sox source.mp3 --channels=1 --bits=16 dest.flac
在这种情况下你也可以使用python来执行命令:
import subprocess
subprocess.check_output(['sox',sourcePath,'--channels=1','--bits=16',destPath])
请注意,您不需要指定 sample_rate_hertz 和编码,因为所有信息都在 flac headers 本身中,因此您可以省略它们:
config = types.RecognitionConfig(language_code="fa-IR")
esponse = client.recognize(config, audio)
您似乎将 encoding
设置为等于 API 提供的所有可能属性。我发现:
encoding = enums.RecognitionConfig.AudioEncoding.ENCODING_UNSPECIFIED
适用于 mp3 文件。所以试试这个:
from google.cloud import speech_v1
from google.cloud.speech_v1 import enums
import io
speech_file = 'chunk7.mp3'
def sample_recognize(local_file_path):
"""
Transcribe a short audio file using synchronous speech recognition
Args:
local_file_path Path to local audio file, e.g. /path/audio.wav
"""
client = speech_v1.SpeechClient()
# local_file_path = 'resources/brooklyn_bridge.raw'
# The language of the supplied audio
language_code = "en-US"
# Sample rate in Hertz of the audio data sent
sample_rate_hertz = 16000
# If this fails try sample_rate_hertz = [8000, 12000, 16000, 24000, 48000]
# Encoding of audio data sent. This sample sets this explicitly.
# This field is optional for FLAC and WAV audio formats.
encoding = enums.RecognitionConfig.AudioEncoding.ENCODING_UNSPECIFIED
config = {
"language_code": language_code,
"sample_rate_hertz": sample_rate_hertz,
"encoding": encoding,
}
with io.open(local_file_path, "rb") as f:
content = f.read()
audio = {"content": content}
response = client.recognize(config, audio)
for result in response.results:
# First alternative is the most probable result
alternative = result.alternatives[0]
print(u"Transcript: {}".format(alternative.transcript))
sample_recognize(speech_file)
以上代码是 speech-to-text docs. If that doesn't work try looking deeper into encoding docs and best practices 中的示例,稍作修改。祝你好运。