LengthNorm 相似度如何影响 Lucene 提升

Question

我有两个文档包含：

doc_1: one two three four five Bingo

doc_2: Bingo one two three four five

我分别在两个字段中建立索引，其中一个字段包含前 5 个术语，第二个字段包含最后一个术语。

TextField start_field = new TextField("start_words", content.substring(0, index), Field.Store.NO);
TextField end_field = new TextField("end_words", content.substring(index,content.length()-1, Field.Store.NO);
// index is index value of 5th ' '

为了更好地看到提升效果，我实现了以下相似性：

DefaultSimilarity customSimilarity = new DefaultSimilarity() {
     @Override
     public float lengthNorm(FieldInvertState state) {
         return 1; // So length of each field would not matter
     }
};

在不应用任何提升的情况下，搜索 Bingo 会导致两个文档具有相同的分数（如预期和预期的那样）。然而，当对其中一个字段 (start_field.setBoost(5)) 应用提升时，两个分数保持相同，尽管 doc_2 包含 Bingo 的字段是提升。

如果我删除 customSimilarity，提升会按预期工作。

为什么 boosting 被 lengthNorm 停止了，我怎样才能使提升工作与给定的覆盖相似度一起工作？

Answer 1

DefaultSimilarity中lengthNorm()的default implementation是state.getBoost() * lengthNorm(numTerms)。

在您的实施中，您没有考虑提升。为了使您的提升变得重要，您可以只实施 return state.getBoost().

LengthNorm 相似度如何影响 Lucene 提升

How is lucene boosting affected by lengthNorm similarity

java

lucene

similarity

solr-boost