使 Lucene.net 荧光笔仅显示来自特定字段的匹配项
Make Lucene.net Highlighter show Matches from a Specific Field only
如果我有一个包含这些内容的 fullText 字段
In 2014 and 2015 the results were ... [more] ... and Sony are developing ... [more]
并查询
+loadTime:[2014 TO 2015] +fullText:sony
The Highlighter 将 2014 和 2015 选为最佳片段。如何让荧光笔忽略查询的 loadTime 部分的匹配项并使用搜索的 fullText 部分的匹配项?我想看到 ... sony ... 片段,即使它的得分低于(刚刚发生的)与全文匹配的日期部分。
我的代码:
ScoreDoc[] hits = [create search];
IFormatter formatter = new SimpleHTMLFormatter("<b>", "</b>");
QueryScorer scorer = new QueryScorer(query, );
Highlighter highlighter = new Highlighter(formatter, scorer);
for (int i = 0; i < hits.Length; i++)
{
int docId = hits[i].Doc;
float score = hits[i].Score;
Document doc = search.Doc(docId);
string fragments = string.Empty;
if (collectFragments)
{
TokenStream stream = _analyzer.TokenStream("", new StringReader(doc.Get(AppConstants.Fields.FullText)));
fragments = highlighter.GetBestFragments(stream, doc.Get(AppConstants.Fields.FullText), 2, "...");
}
...
}
表达式“+loadTime:[2014 TO 2015] +fullText:sony”似乎意味着您要匹配加载时间在 2014 到 2015 之间且全文包含 sony 的文档。
抱歉,我阅读了 Lucene In Action(3.1.2 解析用户输入的查询表达式:QueryParser)并查看了 queryparsersyntax.html,但没有找到一种方法来编写像您这样的查询表达式。最接近的是
loadTime:[2014 TO 2015] AND fullText:sony
可能是版本问题,我的是Lucene 3.4.0。
解决问题的方法可能是 QueryScore
/**
* @param query Query to use for highlighting
* @param field Field to highlight - pass null to ignore fields
*/
public QueryScorer(Query query, String field) {
init(query, field, null, true);
}
在您的代码中,我看到您没有填写该参数。我用 Scala 在我的 Intellij 上试了一下,它成功了,下面是代码。
def singleFieldHighlighter = {
val textToDivide = "In 2016 and 2017 the results were ... [more] ... and Sony are developing ... [more]"
val tokenStream = new StandardAnalyzer(Version.LUCENE_30).tokenStream("fullText", new StringReader(textToDivide));
val searchString = "loadTime:[2014 TO 2015] AND fullText:sony"
val parser = new QueryParser(version, "fullText", new WhitespaceAnalyzer(version))
val parsedQuery = parser.parse(searchString)
val scorer = new QueryScorer(parsedQuery, "fullText")
val formatter = new SimpleHTMLFormatter("<span class='highlight'>", "</span>")
val highlighter = new Highlighter(formatter, scorer)
highlighter.setTextFragmenter(new SimpleSpanFragmenter(scorer))
highlighter.getBestFragments(tokenStream, textToDivide, 3, "...")
}
// The result
// In 2014 and 2015 the results were ... [more] ... and <span class='highlight'>Sony</span> are developing ... [more]
希望对您有所帮助,如果没有,建议您阅读我上面提到的 Lucene In Action 一章来弄清楚。
如果我有一个包含这些内容的 fullText 字段
In 2014 and 2015 the results were ... [more] ... and Sony are developing ... [more]
并查询
+loadTime:[2014 TO 2015] +fullText:sony
The Highlighter 将 2014 和 2015 选为最佳片段。如何让荧光笔忽略查询的 loadTime 部分的匹配项并使用搜索的 fullText 部分的匹配项?我想看到 ... sony ... 片段,即使它的得分低于(刚刚发生的)与全文匹配的日期部分。
我的代码:
ScoreDoc[] hits = [create search];
IFormatter formatter = new SimpleHTMLFormatter("<b>", "</b>");
QueryScorer scorer = new QueryScorer(query, );
Highlighter highlighter = new Highlighter(formatter, scorer);
for (int i = 0; i < hits.Length; i++)
{
int docId = hits[i].Doc;
float score = hits[i].Score;
Document doc = search.Doc(docId);
string fragments = string.Empty;
if (collectFragments)
{
TokenStream stream = _analyzer.TokenStream("", new StringReader(doc.Get(AppConstants.Fields.FullText)));
fragments = highlighter.GetBestFragments(stream, doc.Get(AppConstants.Fields.FullText), 2, "...");
}
...
}
表达式“+loadTime:[2014 TO 2015] +fullText:sony”似乎意味着您要匹配加载时间在 2014 到 2015 之间且全文包含 sony 的文档。 抱歉,我阅读了 Lucene In Action(3.1.2 解析用户输入的查询表达式:QueryParser)并查看了 queryparsersyntax.html,但没有找到一种方法来编写像您这样的查询表达式。最接近的是
loadTime:[2014 TO 2015] AND fullText:sony
可能是版本问题,我的是Lucene 3.4.0。 解决问题的方法可能是 QueryScore
/**
* @param query Query to use for highlighting
* @param field Field to highlight - pass null to ignore fields
*/
public QueryScorer(Query query, String field) {
init(query, field, null, true);
}
在您的代码中,我看到您没有填写该参数。我用 Scala 在我的 Intellij 上试了一下,它成功了,下面是代码。
def singleFieldHighlighter = {
val textToDivide = "In 2016 and 2017 the results were ... [more] ... and Sony are developing ... [more]"
val tokenStream = new StandardAnalyzer(Version.LUCENE_30).tokenStream("fullText", new StringReader(textToDivide));
val searchString = "loadTime:[2014 TO 2015] AND fullText:sony"
val parser = new QueryParser(version, "fullText", new WhitespaceAnalyzer(version))
val parsedQuery = parser.parse(searchString)
val scorer = new QueryScorer(parsedQuery, "fullText")
val formatter = new SimpleHTMLFormatter("<span class='highlight'>", "</span>")
val highlighter = new Highlighter(formatter, scorer)
highlighter.setTextFragmenter(new SimpleSpanFragmenter(scorer))
highlighter.getBestFragments(tokenStream, textToDivide, 3, "...")
}
// The result
// In 2014 and 2015 the results were ... [more] ... and <span class='highlight'>Sony</span> are developing ... [more]
希望对您有所帮助,如果没有,建议您阅读我上面提到的 Lucene In Action 一章来弄清楚。