lucene 中的 JarowinklerDistance 返回奇怪的结果
JarowinklerDistance in lucene is returning strange results
我有一个包含一些短语的文件。使用 lucene 的 jarowinkler,它应该让我从该文件中输入最相似的短语。
这是我的问题的一个例子。
我们有一个文件包含:
//phrases.txt
this is goodd
this is good
this is god
如果我的输入是这很好,它应该先从文件中得到我'this is good',因为这里的相似度分数是最大的(1) .但由于某种原因,它仅 returns: "this is goodd" 和 "this is god"!
这是我的代码:
try {
SpellChecker spellChecker = new SpellChecker(new RAMDirectory(), new JaroWinklerDistance());
Dictionary dictionary = new PlainTextDictionary(new File("src/main/resources/words.txt").toPath());
IndexWriterConfig iwc=new IndexWriterConfig(new ShingleAnalyzerWrapper());
spellChecker.indexDictionary(dictionary,iwc,false);
String wordForSuggestions = "this is good";
int suggestionsNumber = 5;
String[] suggestions = spellChecker.suggestSimilar(wordForSuggestions, suggestionsNumber,0.8f);
if (suggestions!=null && suggestions.length>0) {
for (String word : suggestions) {
System.out.println("Did you mean:" + word);
}
}
else {
System.out.println("No suggestions found for word:"+wordForSuggestions);
}
} catch (IOException e) {
e.printStackTrace();
}
suggestSimilar
不会提供与输入相同的建议。引用源码:
// don't suggest a word for itself, that would be silly
如果想知道wordForSuggestions
是否在字典中,使用exist
方法:
if (spellChecker.exist(wordForSuggestions)) {
//do what you want for an, apparently, correctly spelled word
}
我有一个包含一些短语的文件。使用 lucene 的 jarowinkler,它应该让我从该文件中输入最相似的短语。
这是我的问题的一个例子。
我们有一个文件包含:
//phrases.txt
this is goodd
this is good
this is god
如果我的输入是这很好,它应该先从文件中得到我'this is good',因为这里的相似度分数是最大的(1) .但由于某种原因,它仅 returns: "this is goodd" 和 "this is god"!
这是我的代码:
try {
SpellChecker spellChecker = new SpellChecker(new RAMDirectory(), new JaroWinklerDistance());
Dictionary dictionary = new PlainTextDictionary(new File("src/main/resources/words.txt").toPath());
IndexWriterConfig iwc=new IndexWriterConfig(new ShingleAnalyzerWrapper());
spellChecker.indexDictionary(dictionary,iwc,false);
String wordForSuggestions = "this is good";
int suggestionsNumber = 5;
String[] suggestions = spellChecker.suggestSimilar(wordForSuggestions, suggestionsNumber,0.8f);
if (suggestions!=null && suggestions.length>0) {
for (String word : suggestions) {
System.out.println("Did you mean:" + word);
}
}
else {
System.out.println("No suggestions found for word:"+wordForSuggestions);
}
} catch (IOException e) {
e.printStackTrace();
}
suggestSimilar
不会提供与输入相同的建议。引用源码:
// don't suggest a word for itself, that would be silly
如果想知道wordForSuggestions
是否在字典中,使用exist
方法:
if (spellChecker.exist(wordForSuggestions)) {
//do what you want for an, apparently, correctly spelled word
}