在 Rust 中使用 portaudio 从 deepspeech 中获取空白结果
Getting blank results from deepspeech with portaudio in Rust
我正在尝试将 portaudio 与 deepspeech(均使用 Rust 绑定)结合使用来创建语音识别程序。我在记录缓冲区时可以看到数据,但在尝试使用 intermediate_decode
时,我总是得到空白结果。我假设我要么错误地配置了音频,要么错误地设置了模型。我花了很多时间才达到这一点(对处理音频还很陌生),我们将不胜感激!
这是完整的源代码:
use deepspeech::Model;
use portaudio as pa;
use std::path::Path;
fn start_recognition(mut model: Model) {
let pa = pa::PortAudio::new().expect("Unable to init PortAudio");
let input_settings = pa.default_input_stream_settings(1, 16000.0, 1024).unwrap();
let process_audio = move |pa::InputStreamCallbackArgs { buffer, .. }| {
let mut stream = model
.create_stream()
.expect("Failed to create model stream");
stream.feed_audio(&buffer);
let text = stream.intermediate_decode();
match text {
Ok(t) => {
if t.chars().count() > 0 {
println!("Text: {}", t)
}
pa::Continue
}
Err(err) => {
eprintln!("Error: {:?}", err);
pa::Complete
}
}
};
let mut stream = pa
.open_non_blocking_stream(input_settings, process_audio)
.expect("Unable to create audio stream");
stream.start().expect("Unable to start audio stream");
while let true = stream.is_active().unwrap() {}
}
fn get_model() -> Model {
let dir_path = Path::new("src/models");
let mut graph_name: Box<Path> = dir_path.join("output_graph.pb").into_boxed_path();
for file in dir_path
.read_dir()
.expect("Specified model dir is not a dir")
{
if let Ok(f) = file {
let file_path = f.path();
if file_path.is_file() {
if let Some(ext) = file_path.extension() {
if ext == "pb" || ext == "pbmm" {
graph_name = file_path.into_boxed_path();
}
}
}
}
}
Model::load_from_files(&graph_name).unwrap()
}
fn main() {
let model = get_model();
start_recognition(model);
}
事实证明问题出在 process_audio
回调上。我需要将模型流的初始化移到回调之外。
fn start_recognition(mut model: Model) {
let pa = pa::PortAudio::new().expect("Unable to init PortAudio");
let input_settings = pa.default_input_stream_settings(1, 16000.0, 1024).unwrap();
let mut stream = model
.create_stream()
.expect("Failed to create model stream");
let process_audio = move |pa::InputStreamCallbackArgs { buffer, .. }| {
stream.feed_audio(&buffer);
let text = stream.intermediate_decode();
match text {
Ok(t) => {
if t.chars().count() > 0 {
println!("Text: {}", t)
}
pa::Continue
}
Err(err) => {
eprintln!("Error: {:?}", err);
pa::Complete
}
}
};
let mut stream = pa
.open_non_blocking_stream(input_settings, process_audio)
.expect("Unable to create audio stream");
stream.start().expect("Unable to start audio stream");
while let true = stream.is_active().unwrap() {}
}
我正在尝试将 portaudio 与 deepspeech(均使用 Rust 绑定)结合使用来创建语音识别程序。我在记录缓冲区时可以看到数据,但在尝试使用 intermediate_decode
时,我总是得到空白结果。我假设我要么错误地配置了音频,要么错误地设置了模型。我花了很多时间才达到这一点(对处理音频还很陌生),我们将不胜感激!
这是完整的源代码:
use deepspeech::Model;
use portaudio as pa;
use std::path::Path;
fn start_recognition(mut model: Model) {
let pa = pa::PortAudio::new().expect("Unable to init PortAudio");
let input_settings = pa.default_input_stream_settings(1, 16000.0, 1024).unwrap();
let process_audio = move |pa::InputStreamCallbackArgs { buffer, .. }| {
let mut stream = model
.create_stream()
.expect("Failed to create model stream");
stream.feed_audio(&buffer);
let text = stream.intermediate_decode();
match text {
Ok(t) => {
if t.chars().count() > 0 {
println!("Text: {}", t)
}
pa::Continue
}
Err(err) => {
eprintln!("Error: {:?}", err);
pa::Complete
}
}
};
let mut stream = pa
.open_non_blocking_stream(input_settings, process_audio)
.expect("Unable to create audio stream");
stream.start().expect("Unable to start audio stream");
while let true = stream.is_active().unwrap() {}
}
fn get_model() -> Model {
let dir_path = Path::new("src/models");
let mut graph_name: Box<Path> = dir_path.join("output_graph.pb").into_boxed_path();
for file in dir_path
.read_dir()
.expect("Specified model dir is not a dir")
{
if let Ok(f) = file {
let file_path = f.path();
if file_path.is_file() {
if let Some(ext) = file_path.extension() {
if ext == "pb" || ext == "pbmm" {
graph_name = file_path.into_boxed_path();
}
}
}
}
}
Model::load_from_files(&graph_name).unwrap()
}
fn main() {
let model = get_model();
start_recognition(model);
}
事实证明问题出在 process_audio
回调上。我需要将模型流的初始化移到回调之外。
fn start_recognition(mut model: Model) {
let pa = pa::PortAudio::new().expect("Unable to init PortAudio");
let input_settings = pa.default_input_stream_settings(1, 16000.0, 1024).unwrap();
let mut stream = model
.create_stream()
.expect("Failed to create model stream");
let process_audio = move |pa::InputStreamCallbackArgs { buffer, .. }| {
stream.feed_audio(&buffer);
let text = stream.intermediate_decode();
match text {
Ok(t) => {
if t.chars().count() > 0 {
println!("Text: {}", t)
}
pa::Continue
}
Err(err) => {
eprintln!("Error: {:?}", err);
pa::Complete
}
}
};
let mut stream = pa
.open_non_blocking_stream(input_settings, process_audio)
.expect("Unable to create audio stream");
stream.start().expect("Unable to start audio stream");
while let true = stream.is_active().unwrap() {}
}