是否可以将 real-time 数据发送到 Bing 语音识别？

Question

我正在编写一个应接收音频并将其发送到 Bing 识别 API 以获取文本的应用程序。我使用了服务库，它可以处理 wav 文件。所以我写了自己的流 class 来接收来自麦克风或网络 (RTP) 的音频，并将其发送到识别 API。当我在音频流前面添加 WAV header 时，它会工作几秒钟。

调试表明，识别 api 读取格式流的速度比填充音频源（16k 采样率，16 位，单声道）更快。

所以我的问题是：有没有办法将识别 api 与 real-time（连续）音频流一起使用？

我知道有一个麦克风客户端的例子，但它只适用于麦克风，我需要它用于不同的来源。

Answer 1

我找到了解决问题的方法。我写了一个 class AudioStream 继承自流，它缓冲输入并在调用 Read 方法且其缓冲区为空时等待。这可以防止识别器停止，因为读取方法 return 始终是 > 0 的值。这是这个class的重要部分代码：

public class AudioStream : Stream {
private AutoResetEvent _waitEvent = new AutoResetEvent(false);

internal void AddData(byte[] buffer, int count) {
    _buffer.Add(buffer, count);
    // Enable Read
    _waitEvent.Set();
}
public override int Read(byte[] buffer, int offset, int count) {
    int readCount = 0;
    if ((_buffer.Empty) {
        // Wait for input
        _waitEvent.WaitOne();
    }
    ......
    // Fill buffer from _buffer;

    _waitEvent.Reset();
    return length;
}
protected override void Dispose(bool disposing) {
    // Make sure, that there is no waiting Read
    // Clear buffer, dispose wait event etc.
}
......

}

因为连续接收音频数据，Read 方法不会"hang"超过几毫秒（例如，RTP 包接收所有 20 毫秒）。

Answer 2

如果您想使用麦克风以外的信号源，可以使用 DataRecognitionClient class, by calling SpeechRecognitionServiceFactory's CreateDataClient method. Once you have the client object, you can take audio from any source--microphone, network, reading from a file, etc.--and send it to be processed with the client's SendAudio 方法。当您收到每个音频缓冲区时，您会重新调用 SendAudio.

当您使用SendAudio发送音频时，您将实时（或关闭）以客户端OnPartialResponse事件的形式收到部分识别结果。

发送完音频后，您可以通过调用 EndAudio 向客户端发出您已准备好接收最终识别结果的信号。然后，您应该从客户端收到包含最终识别假设的 OnResponseReceived 事件。

Answer 3

添加有关此主题的其他支持信息：流实现必须支持并发 read/write 操作，并在没有数据时阻塞。

是否可以将 real-time 数据发送到 Bing 语音识别？

Is it possible to send real-time data to Bing Speech Recognition?

c#

speech-to-text

bing

microsoft-cognitive