使 Lucene.net 荧光笔仅显示来自特定字段的匹配项

Make Lucene.net Highlighter show Matches from a Specific Field only

如果我有一个包含这些内容的 fullText 字段

In 2014 and 2015 the results were ... [more] ... and Sony are developing ... [more]

并查询

+loadTime:[2014 TO 2015] +fullText:sony

The Highlighter 将 2014 和 2015 选为最佳片段。如何让荧光笔忽略查询的 loadTime 部分的匹配项并使用搜索的 fullText 部分的匹配项?我想看到 ... sony ... 片段,即使它的得分低于(刚刚发生的)与全文匹配的日期部分。

我的代码:

ScoreDoc[] hits = [create search];
IFormatter formatter = new SimpleHTMLFormatter("<b>", "</b>");
QueryScorer scorer = new QueryScorer(query, );
Highlighter highlighter = new Highlighter(formatter, scorer);

for (int i = 0; i < hits.Length; i++)
{
    int docId = hits[i].Doc;
    float score = hits[i].Score;
    Document doc = search.Doc(docId);

    string fragments = string.Empty;
    if (collectFragments)
    {
        TokenStream stream = _analyzer.TokenStream("", new StringReader(doc.Get(AppConstants.Fields.FullText)));
        fragments = highlighter.GetBestFragments(stream, doc.Get(AppConstants.Fields.FullText), 2, "...");
    }

    ...
}

表达式“+loadTime:[2014 TO 2015] +fullText:sony”似乎意味着您要匹配加载时间在 2014 到 2015 之间且全文包含 sony 的文档。 抱歉,我阅读了 Lucene In Action(3.1.2 解析用户输入的查询表达式:QueryParser)并查看了 queryparsersyntax.html,但没有找到一种方法来编写像您这样的查询表达式。最接近的是

loadTime:[2014 TO 2015] AND fullText:sony

可能是版本问题,我的是Lucene 3.4.0。 解决问题的方法可能是 QueryScore

/**
 * @param query Query to use for highlighting
 * @param field Field to highlight - pass null to ignore fields
*/
public QueryScorer(Query query, String field) {
  init(query, field, null, true);
}

在您的代码中,我看到您没有填写该参数。我用 Scala 在我的 Intellij 上试了一下,它成功了,下面是代码。

def singleFieldHighlighter = {
    val textToDivide = "In 2016 and 2017 the results were ... [more] ... and Sony are developing ... [more]"
    val tokenStream = new StandardAnalyzer(Version.LUCENE_30).tokenStream("fullText", new StringReader(textToDivide));
    val searchString = "loadTime:[2014 TO 2015] AND fullText:sony"
    val parser = new QueryParser(version, "fullText", new WhitespaceAnalyzer(version))
    val parsedQuery = parser.parse(searchString)

    val scorer = new QueryScorer(parsedQuery, "fullText")
    val formatter = new SimpleHTMLFormatter("<span class='highlight'>", "</span>")
    val highlighter = new Highlighter(formatter, scorer)
    highlighter.setTextFragmenter(new SimpleSpanFragmenter(scorer))
    highlighter.getBestFragments(tokenStream, textToDivide, 3, "...")
}


// The result
// In 2014 and 2015 the results were ... [more] ... and <span class='highlight'>Sony</span> are developing ... [more]

希望对您有所帮助,如果没有,建议您阅读我上面提到的 Lucene In Action 一章来弄清楚。