Lucene 短语查询不起作用

Question

我不知道如何使短语查询工作。它 returns 精确计算，但 slop 选项似乎没有什么不同。
这是我的代码：

static void Main(string[] args)
    { 
     using (Directory directory = new RAMDirectory())
        {
            Analyzer analyzer = new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_29);

            using (IndexWriter writer = new IndexWriter(directory, analyzer, true, IndexWriter.MaxFieldLength.UNLIMITED))
            {
                // index a few documents
                writer.AddDocument(createDocument("1", "henry morgan"));
                writer.AddDocument(createDocument("2", "henry junior morgan"));
                writer.AddDocument(createDocument("3", "henry immortal jr morgan"));
                writer.AddDocument(createDocument("4", "morgan henry"));
            }

            // search for documents that have "foo bar" in them
            String sentence = "henry morgan";
            IndexSearcher searcher = new IndexSearcher(directory, true);
            PhraseQuery query = new PhraseQuery()
            {
                //allow inverse order
                Slop = 3
            };

            query.Add(new Term("contents", sentence));

            // display search results
            List<string> results = new List<string>();
            Console.WriteLine("Looking for \"{0}\"...", sentence);
            TopDocs topDocs = searcher.Search(query, 100);
            foreach (ScoreDoc scoreDoc in topDocs.ScoreDocs)
            {
                var matchedContents = searcher.Doc(scoreDoc.Doc).Get("contents");
                results.Add(matchedContents);
                Console.WriteLine("Found: {0}", matchedContents);
            }
        }

private static Document createDocument(string id, string content)
    {
        Document doc = new Document();
        doc.Add(new Field("id", id, Field.Store.YES, Field.Index.NOT_ANALYZED));
        doc.Add(new Field("contents", content, Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS));
        return doc;
    }

我认为除了 id=3 的文档之外的所有选项都应该匹配，但只有第一个匹配。我错过了什么吗？

Answer 1

在 Lucene In Action 2nd 中，3.4.6 按短语搜索：PhraseQuery.

PhraseQuery uses this information to locate documents where terms are within a certain distance of one another

Sure, a plain TermQuery would do the trick to locate this document knowing either of those words, but in this case we only want documents that have phrases where the words are either exactly side by side (quick fox) or have one word in between (quick [irrelevant] fox)

所以PhraseQuery其实是用在term之间的，那一章的示例代码也证明了这一点。当你使用 StandardAnalyzer 时，所以 "henry morgan" 分析后会是亨利和摩根。因此，您不能将 "henry morgan" 添加为一个 Term

/*
   Sets the number of other words permitted between words 
   in query phrase.If zero, then this is an exact phrase search.  
*/
 public void setSlop(int s) { slop = s; }

setSlop的定义可以进一步说明情况。在对您的代码稍加改动后，我就搞定了。

// code in Scala
val query = new PhraseQuery();
query.setSlop(3)
List("henry", "morgan").foreach { word =>
    query.add(new Term("contents", word))
}

在这种情况下，四个文档都会被匹配。如果您还有其他问题，我建议您阅读 Lucene In Action 2nd 中的那一章。这可能会有所帮助。

Lucene 短语查询不起作用

Lucene phrase query does not work

lucene

lucene.net