如何在不重新索引的情况下更新 Lucene Spellchecker 索引?
How to update Lucene Spellchecker indexes without reindexing?
我有一个像这样的 Lucene SpellChecker
索引实现:
def buildAutoSuggestIndex(path:Path):SpellChecker = {
val config = new IndexWriterConfig(new CustomAnalyzer())
val dictionary = new PlainTextDictionary(path)
val directory = FSDirectory.open(path.getParent)
val spellChecker = new SpellChecker(directory)
val jw = new JaroWinklerDistance()
jw.setThreshold(jaroWinklerThreshold)
spellChecker.setStringDistance(new JaroWinklerDistance())
spellChecker.indexDictionary(dictionary, config, true)
spellChecker
}
我需要更新这些 Spellchecker
词典,即重新索引新条目,而不是重新索引整个索引。有什么方法可以更新 SpellChecker
索引吗?
SpellChecker.indexDictionary(...)
已经在此处避免重新索引术语:
terms: while ((currentTerm = iter.next()) != null) {
String word = currentTerm.utf8ToString();
int len = word.length();
if (len < 3) {
continue; // too short we bail but "too long" is fine...
}
if (!isEmpty) {
for (TermsEnum te : termsEnums) {
if (te.seekExact(currentTerm)) {
continue terms;
}
}
}
// ok index the word
Document doc = createDocument(word, getMin(len), getMax(len));
writer.addDocument(doc);
seelkExact
将 return false
如果该术语已包含,并且未添加包含该术语的 n-gram 的文档 (continue terms;
)。
我有一个像这样的 Lucene SpellChecker
索引实现:
def buildAutoSuggestIndex(path:Path):SpellChecker = {
val config = new IndexWriterConfig(new CustomAnalyzer())
val dictionary = new PlainTextDictionary(path)
val directory = FSDirectory.open(path.getParent)
val spellChecker = new SpellChecker(directory)
val jw = new JaroWinklerDistance()
jw.setThreshold(jaroWinklerThreshold)
spellChecker.setStringDistance(new JaroWinklerDistance())
spellChecker.indexDictionary(dictionary, config, true)
spellChecker
}
我需要更新这些 Spellchecker
词典,即重新索引新条目,而不是重新索引整个索引。有什么方法可以更新 SpellChecker
索引吗?
SpellChecker.indexDictionary(...)
已经在此处避免重新索引术语:
terms: while ((currentTerm = iter.next()) != null) {
String word = currentTerm.utf8ToString();
int len = word.length();
if (len < 3) {
continue; // too short we bail but "too long" is fine...
}
if (!isEmpty) {
for (TermsEnum te : termsEnums) {
if (te.seekExact(currentTerm)) {
continue terms;
}
}
}
// ok index the word
Document doc = createDocument(word, getMin(len), getMax(len));
writer.addDocument(doc);
seelkExact
将 return false
如果该术语已包含,并且未添加包含该术语的 n-gram 的文档 (continue terms;
)。