Google 语音 API 返回空白 Json 响应

Google Speech API Returning Blank Json Response

我想将 google 语音 API V1 与 Python 一起使用。

到目前为止,我已经使用 google uri 示例让它工作并收到了内容。当我尝试修改代码以使用自定义录制的音频文件时,我收到了 google 的回复,但没有任何翻译内容。

我通过以下方式设置请求:

"""Transcribe the given raw audio file asynchronously.
Args:
    audio_file: the raw audio file.
"""
audio_file = 'audioFiles/test.raw'

with open(audio_file, 'rb') as speech:
    speech_content = base64.b64encode(speech.read())

service = get_speech_service()
service_request = service.speech().asyncrecognize(
    body={
        'config': {
            'encoding': 'LINEAR16',
            'sampleRate': 16000, 
            'languageCode': 'en-US',
        },
        'audio': {
            'content': speech_content.decode('utf-8', 'ignore')
            }
        })
response = service_request.execute()

print(json.dumps(response))

name = response['name']

service = get_speech_service()
service_request = service.operations().get(name=name)

while True:
    # Get the long running operation with response.
    response = service_request.execute()

    if 'done' in response and response['done']:
        break
    else:
        # Give the server a few seconds to process.
        print('%s, waiting for results from job, %s' % (datetime.now().replace(second=0, microsecond=0), name))
        time.sleep(60)

print(json.dumps(response))

这给我的回应是:

kayl@kayl-Surface-Pro-3:~/audioConversion$ python speechToText.py 
{"name": "527788331906219767"} 2017-03-30 20:10:00, waiting for results from job, 527788331906219767
{"response": {"@type": "type.googleapis.com/google.cloud.speech.v1beta1.AsyncRecognizeResponse"},"done": true, "name": "527788331906219767", "metadata": {"lastUpdateTime": "2017-03-31T03:11:16.391628Z", "@type": "type.googleapis.com/google.cloud.speech.v1beta1.AsyncRecognizeMetadata", "startTime": "2017-03-31T03:10:52.351004Z", "progressPercent": 100}}

我应该在哪里得到以下形式的响应:

{"response": {"@type":"type.googleapis.com/google.cloud.speech.v1beta1.AsyncRecognizeResponse", "results":{...}}...

使用原始音频文件:

要录制此音频,我 运行:

arecord -f cd -d 65 -r 16000 -t raw test.raw

任何能为我指明正确方向的建议都将不胜感激。

你的例子与this sample which is working for me with the test audio files基本相同。

您的代码是否适用于测试示例 audio.raw?如果是这样,很可能是编码问题。我在 best practices 中推荐的 flac 文件和录音方面取得了最大的成功。我过去也曾使用 Audacity 来消除录音中的一些猜测。

来自 Mac OSX,以下 shell 脚本用于获取 65 秒的音频:

  rec --channels=1 --bits=16 --rate=44100 audio.wav trim 0 65

然后我使用以下代码转录音频:

from google.cloud import speech
speech_client = speech.Client()

with io.open(speech_file, 'rb') as audio_file:
    content = audio_file.read()
    audio_sample = speech_client.sample(
        content,
        source_uri=None,
        encoding='LINEAR16',
        sample_rate=44100)

operation = speech_client.speech_api.async_recognize(audio_sample)

retry_count = 100
while retry_count > 0 and not operation.complete:
    retry_count -= 1
    time.sleep(2)
    operation.poll()

if not operation.complete:
    print('Operation not complete and retry limit reached.')
    return

alternatives = operation.results
for alternative in alternatives:
    print('Transcript: {}'.format(alternative.transcript))

请注意,在我的示例中,我使用了新的客户端库,它可以更轻松地访问 API。 This sample code 是我从中获取示例的起点。