"backward.c"，第 421 行：无法将音频与文字记录对齐

Question

我的脚本在进行语音识别训练时效果很好，直到最近我试图扩大规模以训练更多数据，现在它输出了这个错误。

ERROR: "backward.c", line 421: Failed to align audio to trancript: final state of the search is not reached

这是什么意思？我该怎么办？

看起来模型训练仍在继续，但不确定这是否是我可以忽略的错误。

我检查了这个 link，但我很确定我的音频是以 16KHz 采样的。

Answer 1

如 documentation 中所述：

Sometimes audio in your database doesn't match the transcription properly. For example transcription file has the line “Hello world” but in audio actually “Hello hello world” is pronounced. Training process usually detects that and emits this message in the logs. If there are too many such errors it most likely mean you misconfigured something, for example you had a mismatch between audio and the text caused by transcription reordering. Or input audio sample rate is wrong

If there are few errors, you can ignore them. You might want to edit the transcription file to put there exact word which were pronounced, in the case above you need to edit the transcription file and put “Hello hello world” on corresponding line. You might want to filter such prompts because they affect acoustic model quality. In that case you need to enable forced alignment stage in training.

"backward.c"，第 421 行：无法将音频与文字记录对齐

"backward.c", line 421: Failed to align audio to trancript

cmusphinx