如何使用 Lucene IndexReader 查找术语？

Question

我正在尝试通过部分匹配获取多短语查询。根据 JavaDoc for MultiPhraseQuery:

A generalized version of PhraseQuery, with the possibility of adding more than one term at the same position that are treated as a disjunction (OR). To use this class to search for the phrase "Microsoft app*" first create a Builder and use MultiPhraseQuery.Builder.add(Term) on the term "microsoft" (assuming lowercase analysis), then find all terms that have "app" as prefix using LeafReader.terms(String), seeking to "app" then iterating and collecting terms until there is no longer that prefix, and finally use MultiPhraseQuery.Builder.add(Term[]) to add them. MultiPhraseQuery.Builder.build() returns the fully constructed (and immutable) MultiPhraseQuery.

https://lucene.apache.org/core/6_6_0/core/org/apache/lucene/search/MultiPhraseQuery.html

我正在为它所说的部分而苦苦挣扎：

...find all terms that have "app" as prefix using LeafReader.terms(String), seeking to "app" then iterating and collecting terms until there is no longer that prefix...

那边条件怎么找？ LeafReader.terms(String) 给你 Terms 它有一个 iterator 方法给你 TermsEnum 你可以 seek 使用。我只是不确定如何使用它提取匹配项？

Answer 1

听起来你已经掌握了如何获取 TermsEnum，所以从那里开始，只需使用 seekCeil 寻找你想要匹配的前缀，然后遍历 TermsEnum 直到你找到一个与前缀不匹配。例如：

Terms terms = MultiFields.getTerms(indexReader, "text");
TermsEnum termsEnum = terms.iterator();
List<Term> matchingTerms = new ArrayList<Term>();
termsEnum.seekCeil(new BytesRef("app"));
while (termsEnum.term().utf8ToString().startsWith("app")) {
    matchingTerms.add(new Term("text", termsEnum.term()));
    termsEnum.next();
}
System.out.println(matchingTerms);

如何使用 Lucene IndexReader 查找术语？

How to seek to a term using a Lucene IndexReader?

java

lucene

search

seek