在 C++ 中将 Gstreamer 与 Google 语音 API（流式转录）结合使用

Question

我正在使用来自云平台的 Google Speech API 来获取流式音频的语音到文本。我已经使用 curl POST 请求使用 GCP short audio file 完成了 REST API 调用。

我看过Google Streaming Recognize的documentation，上面写着"Streaming speech recognition is available via gRPC only."

我在 OpenSuse Leap 15.0 中安装了 gRPC（也是 protobuf）。这是目录的屏幕截图。

接下来我尝试运行来自 this link 的 streaming_transcribe 示例，我发现示例程序使用本地文件作为输入，但将其模拟为 microphone输入（顺序抓取64K块）然后将数据发送到Google服务器。

对于检查 grpc 是否在我的系统上正确设置的初始测试，我运行 make run_tests。我已将 Makefile 更改为：

...
...Some text as original Makefile
...
.PHONY: all
all: streaming_transcribe
googleapis.ar: $(GOOGLEAPIS_CCS:.cc=.o) 
      ar r $@ $?
streaming_transcribe: streaming_transcribe.o parse_arguments.o googleapis.ar
      $(CXX) $^ $(LDFLAGS) -o $@
run_tests:
      ./streaming_transcribe -b 16000 resources/audio.raw
      ./streaming_transcribe --bitrate 16000 resources/audio2.raw
      ./streaming_transcribe resources/audio.flac
      ./streaming_transcribe resources/quit.raw
clean: rm -f *.o streaming_transcribe \
       googleapis.ar \
       $(GOOGLEAPIS_CCS:.cc=.o)

这个不工作得很好（原来的 Makefile 也不行）。但是 streaming_transcribe.o 文件是在运行生成 Makefile 之后创建的。所以我手动运行文件并得到以下响应

关于如何运行测试和使用 gstreamer 而不是用于模拟 mic-phone 音频的函数有什么建议吗？

Answer 1

也许专用声卡可以听rtsp流？

try (SpeechClient speechClient = SpeechClient.create

RecognitionConfig config =
    RecognitionConfig.newBuilder()
        .setEncoding(AudioEncoding.LINEAR16)
        .setLanguageCode("en-US")
        .setSampleRateHertz(44100)
        .setAudioChannelCount(2)
        .setEnableSeparateRecognitionPerChannel(true)
        .build();

Answer 2

how to run the test

按照 cpp-docs-samples. Prerequisit - Install grpc, protobuf, and googleapis 上的说明进行操作，并按照上面的链接设置环境。

gstreamer instead of the function used for simulating the mic-phone audio

对于这个程序，我创建了管道

gst-launch-1.0 filesrc location=/path/to/file/FOO.wav ! wavparse ! audioconvert ! audio/x-raw,channels=1,depth=16,width=16,rate=44100 ! rtpL16pay  ! udpsink host=xxx.xxx.xxx.xxx port=yyyy

通过在管道中更改适当的 elemnets，可以将音频文件更改为 flac 或 mp3

gst-launch-1.0 udpsrc port=yyyy ! "application/x-rtp,media=(string)audio, clock-rate=(int)44100, width=16, height=16, encoding-name=(string)L16, encoding-params=(string)1, channels=(int)1, channel-positions=(int)1, payload=(int)96" ! rtpL16depay ! audioconvert ! audio/x-raw,format=S16LE ! filesink location=/path/to/where/you/want/to/dump/the/rtp/payloads/ABC.raw

从 rtp 流中获取有效负载并将其写入文件的过程是在另一个线程中完成的，而不是将数据发送到 google 并读取响应。

在 C++ 中将 Gstreamer 与 Google 语音 API（流式转录）结合使用

Using Gstreamer with Google speech API (Streaming Transcribe) in C++

c++

speech-recognition

gstreamer

grpc

google-speech-api