如何进行实时语音识别 | Google 云语音转文本
How to perform real-time speech recognition | Google Cloud Speech-to-Text
我正在尝试从我的扬声器转录音频
我正在将声音从扬声器传送到 node.js 文件 (https://askubuntu.com/a/850174)
parec -d alsa_output.pci-0000_00_1b.0.analog-stereo.monitor --rate=16000 --channels=1 | node transcribe.js
这是我的transcribe.js
const speech = require('@google-cloud/speech');
const client = new speech.SpeechClient();
const encoding = 'LINEAR16';
const sampleRateHertz = 16000;
const languageCode = 'en-US';
const request = {
config: {
encoding: encoding,
sampleRateHertz: sampleRateHertz,
languageCode: languageCode,
},
interimResults: false, // If you want interim results, set this to true
};
const recognizeStream = client
.streamingRecognize(request)
.on('error', console.error)
.on('data', data => {
console.log(
`Transcription: ${data.results[0].alternatives[0].transcript}`
);
});
process.stdin.pipe(recognizeStream);
但是 Google Cloud Speech-to-Text 的流式识别限制在 ~1 分钟内。所以我有错误 "Exceeded maximum allowed stream duration of 65 seconds."
如何将流拆分为以静音作为拆分器的块或拆分为持续时间为 30 秒的块?
我们可以将音频通过管道传输到 sox 实用程序,以便通过持续时间为 0.3 秒且不超过 55 秒的静音将其拆分
sox -t raw -r 16k -e signed -b 16 -c 1 - ./chunks/output.wav silence 1 0.3 0.1% 1 0.3 0.1% trim 0 55 : newfile : restart
现在我们可以查看新文件的块目录并将它们流式传输到 Google Cloud Speech-to-Text API
我正在尝试从我的扬声器转录音频
我正在将声音从扬声器传送到 node.js 文件 (https://askubuntu.com/a/850174)
parec -d alsa_output.pci-0000_00_1b.0.analog-stereo.monitor --rate=16000 --channels=1 | node transcribe.js
这是我的transcribe.js
const speech = require('@google-cloud/speech');
const client = new speech.SpeechClient();
const encoding = 'LINEAR16';
const sampleRateHertz = 16000;
const languageCode = 'en-US';
const request = {
config: {
encoding: encoding,
sampleRateHertz: sampleRateHertz,
languageCode: languageCode,
},
interimResults: false, // If you want interim results, set this to true
};
const recognizeStream = client
.streamingRecognize(request)
.on('error', console.error)
.on('data', data => {
console.log(
`Transcription: ${data.results[0].alternatives[0].transcript}`
);
});
process.stdin.pipe(recognizeStream);
但是 Google Cloud Speech-to-Text 的流式识别限制在 ~1 分钟内。所以我有错误 "Exceeded maximum allowed stream duration of 65 seconds."
如何将流拆分为以静音作为拆分器的块或拆分为持续时间为 30 秒的块?
我们可以将音频通过管道传输到 sox 实用程序,以便通过持续时间为 0.3 秒且不超过 55 秒的静音将其拆分
sox -t raw -r 16k -e signed -b 16 -c 1 - ./chunks/output.wav silence 1 0.3 0.1% 1 0.3 0.1% trim 0 55 : newfile : restart
现在我们可以查看新文件的块目录并将它们流式传输到 Google Cloud Speech-to-Text API