如何优雅地结束 Google Speech-to-Text streamingRecognize 并取回待处理的文本结果?
How to end Google Speech-to-Text streamingRecognize gracefully and get back the pending text results?
我希望能够结束 Google 语音到文本流 (created with streamingRecognize
),并取回待处理的 SR(语音识别)结果。
简而言之,相关Node.js代码:
// create SR stream
const stream = speechClient.streamingRecognize(request);
// observe data event
const dataPromise = new Promise(resolve => stream.on('data', resolve));
// observe error event
const errorPromise = new Promise((resolve, reject) => stream.on('error', reject));
// observe finish event
const finishPromise = new Promise(resolve => stream.on('finish', resolve));
// send the audio
stream.write(audioChunk);
// for testing purposes only, give the SR stream 2 seconds to absorb the audio
await new Promise(resolve => setTimeout(resolve, 2000));
// end the SR stream gracefully, by observing the completion callback
const endPromise = util.promisify(callback => stream.end(callback))();
// a 5 seconds test timeout
const timeoutPromise = new Promise(resolve => setTimeout(resolve, 5000));
// finishPromise wins the race here
await Promise.race([
dataPromise, errorPromise, finishPromise, endPromise, timeoutPromise]);
// endPromise wins the race here
await Promise.race([
dataPromise, errorPromise, endPromise, timeoutPromise]);
// timeoutPromise wins the race here
await Promise.race([dataPromise, errorPromise, timeoutPromise]);
// I don't see any data or error events, dataPromise and errorPromise don't get settled
我的经验是SR流成功结束,但是我没有得到任何数据事件或错误事件。 dataPromise
和 errorPromise
都没有得到解决或拒绝。
如何发出音频结束信号、关闭 SR 流并仍然获得待处理的 SR 结果?
我需要坚持使用 streamingRecognize
API,因为我流式传输的音频是实时的,即使它可能会突然停止。
澄清一下,只要我继续流式传输音频,它就会工作,我确实会收到实时 SR 结果。但是,当我发送最终音频块并像上面那样结束流时,我没有得到我期望的最终结果。
为了得到最终的结果,我实际上必须保持流沉默几秒钟,这可能会增加 ST 账单。我觉得必须有更好的方法来获得它们。
更新: 看来,结束 streamingRecognize
流的唯一适当时间是在 data
事件发生时 StreamingRecognitionResult.is_final
是 true
。同样,我们似乎希望在触发 data
事件之前保持流式传输音频,以获得任何最终或临时结果。
这对我来说像是一个错误,提交 issue。
更新:现在好像已经确认了as a bug。在修复之前,我正在寻找可能的解决方法。
更新: 供将来参考,here is the list 当前和之前跟踪的问题涉及 streamingRecognize
。
我希望这对于那些使用 streamingRecognize
的人来说是一个常见问题,令人惊讶的是以前没有报道过。也将它 as a bug 提交给 issuetracker.google.com
。
这:“我正在寻找潜在的解决方法。”- 您是否考虑过从 SpeechClient 扩展为基础 class?我没有要测试的凭证,但您可以使用自己的 class 从 SpeechClient 进行扩展,然后根据需要调用内部 close()
方法。 close()
方法关闭 SpeechClient 并解析未完成的 Promise。
或者,您也可以根据需要 Proxy SpeechClient() 和 intercept/respond。但由于您的意图是将其关闭,因此以下选项可能是您的解决方法。
const speech = require('@google-cloud/speech');
class ClientProxy extends speech.SpeechClient {
constructor() {
super();
}
myCustomFunction() {
this.close();
}
}
const clientProxy = new ClientProxy();
try {
clientProxy.myCustomFunction();
} catch (err) {
console.log("myCustomFunction generated error: ", err);
}
由于是BUG,不知道适不适合你,我用过this.recognizeStream.end();在不同的情况下多次,它奏效了。但是,我的代码有点不同...
此供稿可能适合您:
https://groups.google.com/g/cloud-speech-discuss/c/lPaTGmEcZQk/m/Kl4fbHK2BQAJ
我的错——毫不奇怪,这在我的代码中变成了一个模糊的竞争条件。
我整理了一个按预期工作的 self-contained 示例 (gist)。它帮助我追踪问题。希望它可以帮助其他人和我未来的自己:
// A simple streamingRecognize workflow,
// tested with Node v15.0.1, by @noseratio
import fs from 'fs';
import path from "path";
import url from 'url';
import util from "util";
import timers from 'timers/promises';
import speech from '@google-cloud/speech';
export {}
// need a 16-bit, 16KHz raw PCM audio
const filename = path.join(path.dirname(url.fileURLToPath(import.meta.url)), "sample.raw");
const encoding = 'LINEAR16';
const sampleRateHertz = 16000;
const languageCode = 'en-US';
const request = {
config: {
encoding: encoding,
sampleRateHertz: sampleRateHertz,
languageCode: languageCode,
},
interimResults: false // If you want interim results, set this to true
};
// init SpeechClient
const client = new speech.v1p1beta1.SpeechClient();
await client.initialize();
// Stream the audio to the Google Cloud Speech API
const stream = client.streamingRecognize(request);
// log all data
stream.on('data', data => {
const result = data.results[0];
console.log(`SR results, final: ${result.isFinal}, text: ${result.alternatives[0].transcript}`);
});
// log all errors
stream.on('error', error => {
console.warn(`SR error: ${error.message}`);
});
// observe data event
const dataPromise = new Promise(resolve => stream.once('data', resolve));
// observe error event
const errorPromise = new Promise((resolve, reject) => stream.once('error', reject));
// observe finish event
const finishPromise = new Promise(resolve => stream.once('finish', resolve));
// observe close event
const closePromise = new Promise(resolve => stream.once('close', resolve));
// we could just pipe it:
// fs.createReadStream(filename).pipe(stream);
// but we want to simulate the web socket data
// read RAW audio as Buffer
const data = await fs.promises.readFile(filename, null);
// simulate multiple audio chunks
console.log("Writting...");
const chunkSize = 4096;
for (let i = 0; i < data.length; i += chunkSize) {
stream.write(data.slice(i, i + chunkSize));
await timers.setTimeout(50);
}
console.log("Done writing.");
console.log("Before ending...");
await util.promisify(c => stream.end(c))();
console.log("After ending.");
// race for events
await Promise.race([
errorPromise.catch(() => console.log("error")),
dataPromise.then(() => console.log("data")),
closePromise.then(() => console.log("close")),
finishPromise.then(() => console.log("finish"))
]);
console.log("Destroying...");
stream.destroy();
console.log("Final timeout...");
await timers.setTimeout(1000);
console.log("Exiting.");
输出:
Writting...
Done writing.
Before ending...
SR results, final: true, text: this is a test I'm testing voice recognition This Is the End
After ending.
data
finish
Destroying...
Final timeout...
close
Exiting.
要对其进行测试,需要 16-bit/16KHz 原始 PCM 音频文件。任意 WAV 文件无法按原样工作,因为它包含 header 和元数据。
我希望能够结束 Google 语音到文本流 (created with streamingRecognize
),并取回待处理的 SR(语音识别)结果。
简而言之,相关Node.js代码:
// create SR stream
const stream = speechClient.streamingRecognize(request);
// observe data event
const dataPromise = new Promise(resolve => stream.on('data', resolve));
// observe error event
const errorPromise = new Promise((resolve, reject) => stream.on('error', reject));
// observe finish event
const finishPromise = new Promise(resolve => stream.on('finish', resolve));
// send the audio
stream.write(audioChunk);
// for testing purposes only, give the SR stream 2 seconds to absorb the audio
await new Promise(resolve => setTimeout(resolve, 2000));
// end the SR stream gracefully, by observing the completion callback
const endPromise = util.promisify(callback => stream.end(callback))();
// a 5 seconds test timeout
const timeoutPromise = new Promise(resolve => setTimeout(resolve, 5000));
// finishPromise wins the race here
await Promise.race([
dataPromise, errorPromise, finishPromise, endPromise, timeoutPromise]);
// endPromise wins the race here
await Promise.race([
dataPromise, errorPromise, endPromise, timeoutPromise]);
// timeoutPromise wins the race here
await Promise.race([dataPromise, errorPromise, timeoutPromise]);
// I don't see any data or error events, dataPromise and errorPromise don't get settled
我的经验是SR流成功结束,但是我没有得到任何数据事件或错误事件。 dataPromise
和 errorPromise
都没有得到解决或拒绝。
如何发出音频结束信号、关闭 SR 流并仍然获得待处理的 SR 结果?
我需要坚持使用 streamingRecognize
API,因为我流式传输的音频是实时的,即使它可能会突然停止。
澄清一下,只要我继续流式传输音频,它就会工作,我确实会收到实时 SR 结果。但是,当我发送最终音频块并像上面那样结束流时,我没有得到我期望的最终结果。
为了得到最终的结果,我实际上必须保持流沉默几秒钟,这可能会增加 ST 账单。我觉得必须有更好的方法来获得它们。
更新: 看来,结束 streamingRecognize
流的唯一适当时间是在 data
事件发生时 StreamingRecognitionResult.is_final
是 true
。同样,我们似乎希望在触发 data
事件之前保持流式传输音频,以获得任何最终或临时结果。
这对我来说像是一个错误,提交 issue。
更新:现在好像已经确认了as a bug。在修复之前,我正在寻找可能的解决方法。
更新: 供将来参考,here is the list 当前和之前跟踪的问题涉及 streamingRecognize
。
我希望这对于那些使用 streamingRecognize
的人来说是一个常见问题,令人惊讶的是以前没有报道过。也将它 as a bug 提交给 issuetracker.google.com
。
这:“我正在寻找潜在的解决方法。”- 您是否考虑过从 SpeechClient 扩展为基础 class?我没有要测试的凭证,但您可以使用自己的 class 从 SpeechClient 进行扩展,然后根据需要调用内部 close()
方法。 close()
方法关闭 SpeechClient 并解析未完成的 Promise。
或者,您也可以根据需要 Proxy SpeechClient() 和 intercept/respond。但由于您的意图是将其关闭,因此以下选项可能是您的解决方法。
const speech = require('@google-cloud/speech');
class ClientProxy extends speech.SpeechClient {
constructor() {
super();
}
myCustomFunction() {
this.close();
}
}
const clientProxy = new ClientProxy();
try {
clientProxy.myCustomFunction();
} catch (err) {
console.log("myCustomFunction generated error: ", err);
}
由于是BUG,不知道适不适合你,我用过this.recognizeStream.end();在不同的情况下多次,它奏效了。但是,我的代码有点不同...
此供稿可能适合您: https://groups.google.com/g/cloud-speech-discuss/c/lPaTGmEcZQk/m/Kl4fbHK2BQAJ
我的错——毫不奇怪,这在我的代码中变成了一个模糊的竞争条件。
我整理了一个按预期工作的 self-contained 示例 (gist)。它帮助我追踪问题。希望它可以帮助其他人和我未来的自己:
// A simple streamingRecognize workflow,
// tested with Node v15.0.1, by @noseratio
import fs from 'fs';
import path from "path";
import url from 'url';
import util from "util";
import timers from 'timers/promises';
import speech from '@google-cloud/speech';
export {}
// need a 16-bit, 16KHz raw PCM audio
const filename = path.join(path.dirname(url.fileURLToPath(import.meta.url)), "sample.raw");
const encoding = 'LINEAR16';
const sampleRateHertz = 16000;
const languageCode = 'en-US';
const request = {
config: {
encoding: encoding,
sampleRateHertz: sampleRateHertz,
languageCode: languageCode,
},
interimResults: false // If you want interim results, set this to true
};
// init SpeechClient
const client = new speech.v1p1beta1.SpeechClient();
await client.initialize();
// Stream the audio to the Google Cloud Speech API
const stream = client.streamingRecognize(request);
// log all data
stream.on('data', data => {
const result = data.results[0];
console.log(`SR results, final: ${result.isFinal}, text: ${result.alternatives[0].transcript}`);
});
// log all errors
stream.on('error', error => {
console.warn(`SR error: ${error.message}`);
});
// observe data event
const dataPromise = new Promise(resolve => stream.once('data', resolve));
// observe error event
const errorPromise = new Promise((resolve, reject) => stream.once('error', reject));
// observe finish event
const finishPromise = new Promise(resolve => stream.once('finish', resolve));
// observe close event
const closePromise = new Promise(resolve => stream.once('close', resolve));
// we could just pipe it:
// fs.createReadStream(filename).pipe(stream);
// but we want to simulate the web socket data
// read RAW audio as Buffer
const data = await fs.promises.readFile(filename, null);
// simulate multiple audio chunks
console.log("Writting...");
const chunkSize = 4096;
for (let i = 0; i < data.length; i += chunkSize) {
stream.write(data.slice(i, i + chunkSize));
await timers.setTimeout(50);
}
console.log("Done writing.");
console.log("Before ending...");
await util.promisify(c => stream.end(c))();
console.log("After ending.");
// race for events
await Promise.race([
errorPromise.catch(() => console.log("error")),
dataPromise.then(() => console.log("data")),
closePromise.then(() => console.log("close")),
finishPromise.then(() => console.log("finish"))
]);
console.log("Destroying...");
stream.destroy();
console.log("Final timeout...");
await timers.setTimeout(1000);
console.log("Exiting.");
输出:
Writting... Done writing. Before ending... SR results, final: true, text: this is a test I'm testing voice recognition This Is the End After ending. data finish Destroying... Final timeout... close Exiting.
要对其进行测试,需要 16-bit/16KHz 原始 PCM 音频文件。任意 WAV 文件无法按原样工作,因为它包含 header 和元数据。