Sphinx 4 可怕的准确性
Sphinx 4 terrible accuracy
我正在尝试让 sphinx 4 与我的桌面应用程序一起工作,但它在 0% 的时间内正确运行
我还使用默认语言模型和 sphinx4 data.jar
中的所有内容
代码:
import edu.cmu.sphinx.api.Configuration;
import edu.cmu.sphinx.api.Microphone;
import edu.cmu.sphinx.api.SpeechResult;
import edu.cmu.sphinx.api.StreamSpeechRecognizer;
public class Speechy {
public static void main(String[] args) throws Exception {
Configuration configuration = new Configuration();
configuration.setAcousticModelPath("resource:/edu/cmu/sphinx/models/en-us/en-us");
configuration.setDictionaryPath("resource:/edu/cmu/sphinx/models/en-us/cmudict-en-us.dict");
configuration.setLanguageModelPath("resource:/edu/cmu/sphinx/models/en-us/en-us.lm.bin");
Microphone micro = new Microphone(8000, 16, true, false);
micro.startRecording();
StreamSpeechRecognizer recognizer = new StreamSpeechRecognizer(configuration);
// Start recognition process pruning previously cached data.
recognizer.startRecognition(micro.getStream());
while(true){
SpeechResult result = recognizer.getResult();
System.out.println(result.getHypothesis());
}
}
}
Microphone micro = new Microphone(8000, 16, true, false);
默认声学模型需要16khz音频,你配置的8000是错误的。
另见 the tutorial
The top reasons of the bad accuracy are:
The mismatch of the sample rate/no. of channels of the incoming audio
or the mismatch of the incoming audio bandwidth. It must be 16kHz (or
8kHz, depending on the training data) 16bit Mono (= single channel)
Little-Endian file. You need to fix sample rate of the source with
resampling (only if its rate is higher than that of the training
data). You should not upsample a file and decode it with acoustic
models trained on higher sampling rate audio. Audio file format
(sampling rate, number of channels) can be verified using below
command sox --i /path/to/audio/file. Find more information here: What
is sample rate
我正在尝试让 sphinx 4 与我的桌面应用程序一起工作,但它在 0% 的时间内正确运行 我还使用默认语言模型和 sphinx4 data.jar
中的所有内容代码:
import edu.cmu.sphinx.api.Configuration;
import edu.cmu.sphinx.api.Microphone;
import edu.cmu.sphinx.api.SpeechResult;
import edu.cmu.sphinx.api.StreamSpeechRecognizer;
public class Speechy {
public static void main(String[] args) throws Exception {
Configuration configuration = new Configuration();
configuration.setAcousticModelPath("resource:/edu/cmu/sphinx/models/en-us/en-us");
configuration.setDictionaryPath("resource:/edu/cmu/sphinx/models/en-us/cmudict-en-us.dict");
configuration.setLanguageModelPath("resource:/edu/cmu/sphinx/models/en-us/en-us.lm.bin");
Microphone micro = new Microphone(8000, 16, true, false);
micro.startRecording();
StreamSpeechRecognizer recognizer = new StreamSpeechRecognizer(configuration);
// Start recognition process pruning previously cached data.
recognizer.startRecognition(micro.getStream());
while(true){
SpeechResult result = recognizer.getResult();
System.out.println(result.getHypothesis());
}
}
}
Microphone micro = new Microphone(8000, 16, true, false);
默认声学模型需要16khz音频,你配置的8000是错误的。
另见 the tutorial
The top reasons of the bad accuracy are:
The mismatch of the sample rate/no. of channels of the incoming audio or the mismatch of the incoming audio bandwidth. It must be 16kHz (or 8kHz, depending on the training data) 16bit Mono (= single channel) Little-Endian file. You need to fix sample rate of the source with resampling (only if its rate is higher than that of the training data). You should not upsample a file and decode it with acoustic models trained on higher sampling rate audio. Audio file format (sampling rate, number of channels) can be verified using below command sox --i /path/to/audio/file. Find more information here: What is sample rate