400 指定MP3编码匹配音频文件

Question

我正在尝试使用 google-speech2text api 但是，即使我已将我的代码设置为通过所有可用的编码器，我仍然收到 "Specify MP3 encoding to match audio file"。

This 是我尝试使用的文件

我必须补充一点，如果我在 their UI 上上传文件，我可以获得输出。所以我假设源文件没有任何问题。

from google.cloud import speech
from google.cloud.speech import enums
from google.cloud.speech import types
client = speech.SpeechClient.from_service_account_json('gcp_credentials.json')

speech_file = 'chunk7.mp3'

import io
from google.cloud import speech
from google.cloud.speech import enums
from google.cloud.speech import types


with io.open(speech_file, 'rb') as audio_file:
    content = audio_file.read()
    audio = types.RecognitionAudio(content=content)

import wave

ENCODING = [enums.RecognitionConfig.AudioEncoding.LINEAR16, 
            enums.RecognitionConfig.AudioEncoding.FLAC,
            enums.RecognitionConfig.AudioEncoding.MULAW,
            enums.RecognitionConfig.AudioEncoding.AMR,
            enums.RecognitionConfig.AudioEncoding.AMR_WB,
            enums.RecognitionConfig.AudioEncoding.OGG_OPUS, 
            enums.RecognitionConfig.AudioEncoding.SPEEX_WITH_HEADER_BYTE]

SAMPLE_RATE_HERTZ = [8000, 12000, 16000, 24000, 48000]
for enco in ENCODING:
    for rate in SAMPLE_RATE_HERTZ:
        config = types.RecognitionConfig(
            encoding=enco,
            sample_rate_hertz=rate,
            language_code='en-US')

        # Detects speech in the audio file
        response = []

        print(response)
        try:
            response = client.recognize(config, audio)
            print(response)
        except:
            pass
        print("-----------------------------------------------------")
        print(str(rate) + "   " + str(enco))
        print("response: ", str(response))

或者，还有另一个波斯语文件 here ('fa-IR') - 我遇到了类似的问题。我最初把奥巴马的文件放在那里，因为它更容易理解。如果也用第二个文件测试你的答案，我将不胜感激。

Answer 1

您的音频格式似乎不受支持，只需转换为其他格式（建议使用 flac）即可轻松实现，您有两个选择：

在 google 中搜索在线音频转换
在你的机器上自己转换：

1) 安装sox（编辑）

2) 安装编码器需要它：
```
 * [lame](http://lame.sourceforge.net) mp3 encoder
 * [flac](https://xiph.org/flac/download.html) flac encoder
```
3) 运行命令：

sox source.mp3 --channels=1 --bits=16 dest.flac

在这种情况下你也可以使用python来执行命令：

import subprocess
subprocess.check_output(['sox',sourcePath,'--channels=1','--bits=16',destPath])

请注意，您不需要指定 sample_rate_hertz 和编码，因为所有信息都在 flac headers 本身中，因此您可以省略它们：

config = types.RecognitionConfig(language_code="fa-IR")
esponse = client.recognize(config, audio)

资源：troubleshooting

Answer 2

您似乎将 encoding 设置为等于 API 提供的所有可能属性。我发现：

encoding = enums.RecognitionConfig.AudioEncoding.ENCODING_UNSPECIFIED

适用于 mp3 文件。所以试试这个：

from google.cloud import speech_v1
from google.cloud.speech_v1 import enums
import io
speech_file = 'chunk7.mp3'


def sample_recognize(local_file_path):
    """
    Transcribe a short audio file using synchronous speech recognition

    Args:
      local_file_path Path to local audio file, e.g. /path/audio.wav
    """

    client = speech_v1.SpeechClient()

    # local_file_path = 'resources/brooklyn_bridge.raw'

    # The language of the supplied audio
    language_code = "en-US"

    # Sample rate in Hertz of the audio data sent
    sample_rate_hertz = 16000   
    # If this fails try sample_rate_hertz = [8000, 12000, 16000, 24000, 48000]


    # Encoding of audio data sent. This sample sets this explicitly.
    # This field is optional for FLAC and WAV audio formats.
    encoding = enums.RecognitionConfig.AudioEncoding.ENCODING_UNSPECIFIED
    config = {
        "language_code": language_code,
        "sample_rate_hertz": sample_rate_hertz,
        "encoding": encoding,
    }
    with io.open(local_file_path, "rb") as f:
        content = f.read()
    audio = {"content": content}

    response = client.recognize(config, audio)
    for result in response.results:
        # First alternative is the most probable result
        alternative = result.alternatives[0]
        print(u"Transcript: {}".format(alternative.transcript))

sample_recognize(speech_file)

以上代码是 speech-to-text docs. If that doesn't work try looking deeper into encoding docs and best practices 中的示例，稍作修改。祝你好运。

400 指定MP3编码匹配音频文件

400 Specify MP3 encoding to match audio file

google-speech-api

google-cloud-speech