Google 语音 API 返回空白 Json 响应

Question

我想将 google 语音 API V1 与 Python 一起使用。

到目前为止，我已经使用 google uri 示例让它工作并收到了内容。当我尝试修改代码以使用自定义录制的音频文件时，我收到了 google 的回复，但没有任何翻译内容。

我通过以下方式设置请求：

"""Transcribe the given raw audio file asynchronously.
Args:
    audio_file: the raw audio file.
"""
audio_file = 'audioFiles/test.raw'

with open(audio_file, 'rb') as speech:
    speech_content = base64.b64encode(speech.read())

service = get_speech_service()
service_request = service.speech().asyncrecognize(
    body={
        'config': {
            'encoding': 'LINEAR16',
            'sampleRate': 16000, 
            'languageCode': 'en-US',
        },
        'audio': {
            'content': speech_content.decode('utf-8', 'ignore')
            }
        })
response = service_request.execute()

print(json.dumps(response))

name = response['name']

service = get_speech_service()
service_request = service.operations().get(name=name)

while True:
    # Get the long running operation with response.
    response = service_request.execute()

    if 'done' in response and response['done']:
        break
    else:
        # Give the server a few seconds to process.
        print('%s, waiting for results from job, %s' % (datetime.now().replace(second=0, microsecond=0), name))
        time.sleep(60)

print(json.dumps(response))

这给我的回应是：

kayl@kayl-Surface-Pro-3:~/audioConversion$ python speechToText.py 
{"name": "527788331906219767"} 2017-03-30 20:10:00, waiting for results from job, 527788331906219767
{"response": {"@type": "type.googleapis.com/google.cloud.speech.v1beta1.AsyncRecognizeResponse"},"done": true, "name": "527788331906219767", "metadata": {"lastUpdateTime": "2017-03-31T03:11:16.391628Z", "@type": "type.googleapis.com/google.cloud.speech.v1beta1.AsyncRecognizeMetadata", "startTime": "2017-03-31T03:10:52.351004Z", "progressPercent": 100}}

我应该在哪里得到以下形式的响应：

{"response": {"@type":"type.googleapis.com/google.cloud.speech.v1beta1.AsyncRecognizeResponse", "results":{...}}...

使用原始音频文件：

16000hz采样率，也试过41000hz
16 位 Little Endian
签名
65 秒长

要录制此音频，我运行:

arecord -f cd -d 65 -r 16000 -t raw test.raw

任何能为我指明正确方向的建议都将不胜感激。

Answer 1

你的例子与this sample which is working for me with the test audio files基本相同。

您的代码是否适用于测试示例 audio.raw？如果是这样，很可能是编码问题。我在 best practices 中推荐的 flac 文件和录音方面取得了最大的成功。我过去也曾使用 Audacity 来消除录音中的一些猜测。

来自 Mac OSX，以下 shell 脚本用于获取 65 秒的音频：

  rec --channels=1 --bits=16 --rate=44100 audio.wav trim 0 65

然后我使用以下代码转录音频：

from google.cloud import speech
speech_client = speech.Client()

with io.open(speech_file, 'rb') as audio_file:
    content = audio_file.read()
    audio_sample = speech_client.sample(
        content,
        source_uri=None,
        encoding='LINEAR16',
        sample_rate=44100)

operation = speech_client.speech_api.async_recognize(audio_sample)

retry_count = 100
while retry_count > 0 and not operation.complete:
    retry_count -= 1
    time.sleep(2)
    operation.poll()

if not operation.complete:
    print('Operation not complete and retry limit reached.')
    return

alternatives = operation.results
for alternative in alternatives:
    print('Transcript: {}'.format(alternative.transcript))

请注意，在我的示例中，我使用了新的客户端库，它可以更轻松地访问 API。 This sample code 是我从中获取示例的起点。

Google 语音 API 返回空白 Json 响应

Google Speech API Returning Blank Json Response

api

speech

python-2.7