CMUSphinx 从不识别音频文件中的任何单词
CMUSphinx never recognizes any word from audio files
Sphinx 似乎无法识别或处理音频文件,它接受音频流吐出一个空数组(SpeechResult 结果)。我觉得我正在使用的音频文件没有任何问题,因为我已经尝试了几个,但对其中任何一个都不起作用。有没有人知道他们知道有效的音频文件?有什么突出的地方可能导致流不产生转录吗?
public static void main(String args[]) throws IOException {
Configuration configuration = new Configuration();
configuration.setAcousticModelPath("resource:/edu/cmu/sphinx/models/en-us/en-us");
configuration.setDictionaryPath("resource:/edu/cmu/sphinx/models/en-us/cmudict-en-us.dict");
configuration.setLanguageModelPath("resource:/edu/cmu/sphinx/models/en-us/en-us.lm.dmp");
StreamSpeechRecognizer recognizer = new StreamSpeechRecognizer(configuration);
//recognizer.startRecognition(new FileInputStream("E:/1video/hello-5.mp3"));
File file = new File("E:/1video/bargain_not.wav");
FileInputStream fis = new FileInputStream(file);
InputStream is = new FileInputStream(file);
//is = AutomaticSpeechRecognition.class.getResourceAsStream("/edu/cmu/sphinx/demo/aligner/10001-90210-01803.wav");
recognizer.startRecognition(is);
SpeechResult result = null;
while((result = recognizer.getResult()) != null) {
System.out.println(result.getResult());
System.out.println(result.getHypothesis());
System.out.println(result.getWords());
}
//result = recognizer.getResult();
//System.out.println(result);
//System.out.println(result.toString());
//System.out.println(result.getWords());
/*for (WordResult wordResult : result.getWords())
{
System.out.println(wordResult);
}*/
recognizer.stopRecognition();
}
这是运行它的输出——它似乎没有任何失败
09:31:13.430 INFO unitManager CI Unit: *+NSN+
09:31:13.433 INFO unitManager CI Unit: *+SPN+
09:31:13.433 INFO unitManager CI Unit: AA
09:31:13.434 INFO unitManager CI Unit: AE
09:31:13.434 INFO unitManager CI Unit: AH
09:31:13.434 INFO unitManager CI Unit: AO
09:31:13.434 INFO unitManager CI Unit: AW
09:31:13.434 INFO unitManager CI Unit: AY
09:31:13.434 INFO unitManager CI Unit: B
09:31:13.434 INFO unitManager CI Unit: CH
09:31:13.434 INFO unitManager CI Unit: D
09:31:13.434 INFO unitManager CI Unit: DH
09:31:13.434 INFO unitManager CI Unit: EH
09:31:13.435 INFO unitManager CI Unit: ER
09:31:13.435 INFO unitManager CI Unit: EY
09:31:13.435 INFO unitManager CI Unit: F
09:31:13.435 INFO unitManager CI Unit: G
09:31:13.435 INFO unitManager CI Unit: HH
09:31:13.435 INFO unitManager CI Unit: IH
09:31:13.435 INFO unitManager CI Unit: IY
09:31:13.435 INFO unitManager CI Unit: JH
09:31:13.435 INFO unitManager CI Unit: K
09:31:13.435 INFO unitManager CI Unit: L
09:31:13.435 INFO unitManager CI Unit: M
09:31:13.436 INFO unitManager CI Unit: N
09:31:13.436 INFO unitManager CI Unit: NG
09:31:13.436 INFO unitManager CI Unit: OW
09:31:13.436 INFO unitManager CI Unit: OY
09:31:13.436 INFO unitManager CI Unit: P
09:31:13.436 INFO unitManager CI Unit: R
09:31:13.436 INFO unitManager CI Unit: S
09:31:13.436 INFO unitManager CI Unit: SH
09:31:13.436 INFO unitManager CI Unit: T
09:31:13.436 INFO unitManager CI Unit: TH
09:31:13.436 INFO unitManager CI Unit: UH
09:31:13.437 INFO unitManager CI Unit: UW
09:31:13.437 INFO unitManager CI Unit: V
09:31:13.437 INFO unitManager CI Unit: W
09:31:13.437 INFO unitManager CI Unit: Y
09:31:13.437 INFO unitManager CI Unit: Z
09:31:13.437 INFO unitManager CI Unit: ZH
09:31:14.014 INFO autoCepstrum Cepstrum component auto-configured as follows: autoCepstrum {MelFrequencyFilterBank, Denoise, DiscreteCosineTransform2, Lifter}
09:31:14.030 INFO dictionary Loading dictionary from: jar:file:/C:/Users/Kevin/.m2/repository/edu/cmu/sphinx/sphinx4-data/1.0-SNAPSHOT/sphinx4-data-1.0-SNAPSHOT.jar!/edu/cmu/sphinx/models/en-us/cmudict-en-us.dict
09:31:14.132 INFO dictionary Loading filler dictionary from: jar:file:/C:/Users/Kevin/.m2/repository/edu/cmu/sphinx/sphinx4-data/1.0-SNAPSHOT/sphinx4-data-1.0-SNAPSHOT.jar!/edu/cmu/sphinx/models/en-us/en-us/noisedict
09:31:14.132 INFO acousticModelLoader Loading tied-state acoustic model from: jar:file:/C:/Users/Kevin/.m2/repository/edu/cmu/sphinx/sphinx4-data/1.0-SNAPSHOT/sphinx4-data-1.0-SNAPSHOT.jar!/edu/cmu/sphinx/models/en-us/en-us
09:31:14.133 INFO acousticModelLoader Pool means Entries: 16128
09:31:14.133 INFO acousticModelLoader Pool variances Entries: 16128
09:31:14.133 INFO acousticModelLoader Pool transition_matrices Entries: 42
09:31:14.133 INFO acousticModelLoader Pool senones Entries: 5126
09:31:14.133 INFO acousticModelLoader Gaussian weights: mixture_weights. Entries: 15378
09:31:14.133 INFO acousticModelLoader Pool senones Entries: 5126
09:31:14.133 INFO acousticModelLoader Context Independent Unit Entries: 42
09:31:14.133 INFO acousticModelLoader HMM Manager: 137095 hmms
09:31:14.134 INFO acousticModel CompositeSenoneSequences: 0
09:31:14.134 INFO largeTrigramModel Loading n-gram language model from: jar:file:/C:/Users/Kevin/.m2/repository/edu/cmu/sphinx/sphinx4-data/1.0-SNAPSHOT/sphinx4-data-1.0-SNAPSHOT.jar!/edu/cmu/sphinx/models/en-us/en-us.lm.dmp
09:31:14.807 INFO largeTrigramModel 1-grams: 19794
09:31:14.807 INFO largeTrigramModel 2-grams: 1377200
09:31:14.807 INFO largeTrigramModel 3-grams: 3178194
09:31:15.582 INFO lexTreeLinguist Max CI Units 43
09:31:15.583 INFO lexTreeLinguist Unit table size 79507
09:31:15.585 INFO speedTracker # ----------------------------- Timers----------------------------------------
09:31:15.585 INFO speedTracker # Name Count CurTime MinTime MaxTime AvgTime TotTime
09:31:15.586 INFO speedTracker Load Dictionary 1 0.1020s 0.1020s 0.1020s 0.1020s 0.1020s
09:31:15.586 INFO speedTracker Load LM 1 0.6730s 0.6730s 0.6730s 0.6730s 0.6730s
09:31:15.586 INFO speedTracker Compile 1 0.7760s 0.7760s 0.7760s 0.7760s 0.7760s
09:31:15.586 INFO speedTracker Load AM 1 1.5450s 1.5450s 1.5450s 1.5450s 1.5450s
09:31:15.608 INFO speedTracker This Time Audio: 1.94s Proc: 0.01s Speed: 0.00 X real time
09:31:15.608 INFO speedTracker Total Time Audio: 1.94s Proc: 0.01s 0.00 X real time
09:31:15.609 INFO memoryTracker Mem Total: 454.75 Mb Free: 262.35 Mb
09:31:15.609 INFO memoryTracker Used: This: 192.40 Mb Avg: 192.40 Mb Max: 192.40 Mb
09:31:15.610 INFO largeTrigramModel LM Cache Size: 0 Hits: 0 Misses: 0
<s> </s>
正如 Nikolay Shmyrev 所说,文件必须是 16khz 16bit 单声道 MSWAV。这样的文件可以用 Audacity 录制。
文件导出并确保选择 WAV (Microsoft) 签名的 16 位 PCM。
Sphinx 似乎无法识别或处理音频文件,它接受音频流吐出一个空数组(SpeechResult 结果)。我觉得我正在使用的音频文件没有任何问题,因为我已经尝试了几个,但对其中任何一个都不起作用。有没有人知道他们知道有效的音频文件?有什么突出的地方可能导致流不产生转录吗?
public static void main(String args[]) throws IOException {
Configuration configuration = new Configuration();
configuration.setAcousticModelPath("resource:/edu/cmu/sphinx/models/en-us/en-us");
configuration.setDictionaryPath("resource:/edu/cmu/sphinx/models/en-us/cmudict-en-us.dict");
configuration.setLanguageModelPath("resource:/edu/cmu/sphinx/models/en-us/en-us.lm.dmp");
StreamSpeechRecognizer recognizer = new StreamSpeechRecognizer(configuration);
//recognizer.startRecognition(new FileInputStream("E:/1video/hello-5.mp3"));
File file = new File("E:/1video/bargain_not.wav");
FileInputStream fis = new FileInputStream(file);
InputStream is = new FileInputStream(file);
//is = AutomaticSpeechRecognition.class.getResourceAsStream("/edu/cmu/sphinx/demo/aligner/10001-90210-01803.wav");
recognizer.startRecognition(is);
SpeechResult result = null;
while((result = recognizer.getResult()) != null) {
System.out.println(result.getResult());
System.out.println(result.getHypothesis());
System.out.println(result.getWords());
}
//result = recognizer.getResult();
//System.out.println(result);
//System.out.println(result.toString());
//System.out.println(result.getWords());
/*for (WordResult wordResult : result.getWords())
{
System.out.println(wordResult);
}*/
recognizer.stopRecognition();
}
这是运行它的输出——它似乎没有任何失败
09:31:13.430 INFO unitManager CI Unit: *+NSN+
09:31:13.433 INFO unitManager CI Unit: *+SPN+
09:31:13.433 INFO unitManager CI Unit: AA
09:31:13.434 INFO unitManager CI Unit: AE
09:31:13.434 INFO unitManager CI Unit: AH
09:31:13.434 INFO unitManager CI Unit: AO
09:31:13.434 INFO unitManager CI Unit: AW
09:31:13.434 INFO unitManager CI Unit: AY
09:31:13.434 INFO unitManager CI Unit: B
09:31:13.434 INFO unitManager CI Unit: CH
09:31:13.434 INFO unitManager CI Unit: D
09:31:13.434 INFO unitManager CI Unit: DH
09:31:13.434 INFO unitManager CI Unit: EH
09:31:13.435 INFO unitManager CI Unit: ER
09:31:13.435 INFO unitManager CI Unit: EY
09:31:13.435 INFO unitManager CI Unit: F
09:31:13.435 INFO unitManager CI Unit: G
09:31:13.435 INFO unitManager CI Unit: HH
09:31:13.435 INFO unitManager CI Unit: IH
09:31:13.435 INFO unitManager CI Unit: IY
09:31:13.435 INFO unitManager CI Unit: JH
09:31:13.435 INFO unitManager CI Unit: K
09:31:13.435 INFO unitManager CI Unit: L
09:31:13.435 INFO unitManager CI Unit: M
09:31:13.436 INFO unitManager CI Unit: N
09:31:13.436 INFO unitManager CI Unit: NG
09:31:13.436 INFO unitManager CI Unit: OW
09:31:13.436 INFO unitManager CI Unit: OY
09:31:13.436 INFO unitManager CI Unit: P
09:31:13.436 INFO unitManager CI Unit: R
09:31:13.436 INFO unitManager CI Unit: S
09:31:13.436 INFO unitManager CI Unit: SH
09:31:13.436 INFO unitManager CI Unit: T
09:31:13.436 INFO unitManager CI Unit: TH
09:31:13.436 INFO unitManager CI Unit: UH
09:31:13.437 INFO unitManager CI Unit: UW
09:31:13.437 INFO unitManager CI Unit: V
09:31:13.437 INFO unitManager CI Unit: W
09:31:13.437 INFO unitManager CI Unit: Y
09:31:13.437 INFO unitManager CI Unit: Z
09:31:13.437 INFO unitManager CI Unit: ZH
09:31:14.014 INFO autoCepstrum Cepstrum component auto-configured as follows: autoCepstrum {MelFrequencyFilterBank, Denoise, DiscreteCosineTransform2, Lifter}
09:31:14.030 INFO dictionary Loading dictionary from: jar:file:/C:/Users/Kevin/.m2/repository/edu/cmu/sphinx/sphinx4-data/1.0-SNAPSHOT/sphinx4-data-1.0-SNAPSHOT.jar!/edu/cmu/sphinx/models/en-us/cmudict-en-us.dict
09:31:14.132 INFO dictionary Loading filler dictionary from: jar:file:/C:/Users/Kevin/.m2/repository/edu/cmu/sphinx/sphinx4-data/1.0-SNAPSHOT/sphinx4-data-1.0-SNAPSHOT.jar!/edu/cmu/sphinx/models/en-us/en-us/noisedict
09:31:14.132 INFO acousticModelLoader Loading tied-state acoustic model from: jar:file:/C:/Users/Kevin/.m2/repository/edu/cmu/sphinx/sphinx4-data/1.0-SNAPSHOT/sphinx4-data-1.0-SNAPSHOT.jar!/edu/cmu/sphinx/models/en-us/en-us
09:31:14.133 INFO acousticModelLoader Pool means Entries: 16128
09:31:14.133 INFO acousticModelLoader Pool variances Entries: 16128
09:31:14.133 INFO acousticModelLoader Pool transition_matrices Entries: 42
09:31:14.133 INFO acousticModelLoader Pool senones Entries: 5126
09:31:14.133 INFO acousticModelLoader Gaussian weights: mixture_weights. Entries: 15378
09:31:14.133 INFO acousticModelLoader Pool senones Entries: 5126
09:31:14.133 INFO acousticModelLoader Context Independent Unit Entries: 42
09:31:14.133 INFO acousticModelLoader HMM Manager: 137095 hmms
09:31:14.134 INFO acousticModel CompositeSenoneSequences: 0
09:31:14.134 INFO largeTrigramModel Loading n-gram language model from: jar:file:/C:/Users/Kevin/.m2/repository/edu/cmu/sphinx/sphinx4-data/1.0-SNAPSHOT/sphinx4-data-1.0-SNAPSHOT.jar!/edu/cmu/sphinx/models/en-us/en-us.lm.dmp
09:31:14.807 INFO largeTrigramModel 1-grams: 19794
09:31:14.807 INFO largeTrigramModel 2-grams: 1377200
09:31:14.807 INFO largeTrigramModel 3-grams: 3178194
09:31:15.582 INFO lexTreeLinguist Max CI Units 43
09:31:15.583 INFO lexTreeLinguist Unit table size 79507
09:31:15.585 INFO speedTracker # ----------------------------- Timers----------------------------------------
09:31:15.585 INFO speedTracker # Name Count CurTime MinTime MaxTime AvgTime TotTime
09:31:15.586 INFO speedTracker Load Dictionary 1 0.1020s 0.1020s 0.1020s 0.1020s 0.1020s
09:31:15.586 INFO speedTracker Load LM 1 0.6730s 0.6730s 0.6730s 0.6730s 0.6730s
09:31:15.586 INFO speedTracker Compile 1 0.7760s 0.7760s 0.7760s 0.7760s 0.7760s
09:31:15.586 INFO speedTracker Load AM 1 1.5450s 1.5450s 1.5450s 1.5450s 1.5450s
09:31:15.608 INFO speedTracker This Time Audio: 1.94s Proc: 0.01s Speed: 0.00 X real time
09:31:15.608 INFO speedTracker Total Time Audio: 1.94s Proc: 0.01s 0.00 X real time
09:31:15.609 INFO memoryTracker Mem Total: 454.75 Mb Free: 262.35 Mb
09:31:15.609 INFO memoryTracker Used: This: 192.40 Mb Avg: 192.40 Mb Max: 192.40 Mb
09:31:15.610 INFO largeTrigramModel LM Cache Size: 0 Hits: 0 Misses: 0
<s> </s>
正如 Nikolay Shmyrev 所说,文件必须是 16khz 16bit 单声道 MSWAV。这样的文件可以用 Audacity 录制。
文件导出并确保选择 WAV (Microsoft) 签名的 16 位 PCM。