Solr - 更改分数的计算方式? (求和而不是最大值)
Solr - Change how score is calculated? (Sum instead of Max)
我们遇到了一些与 Solr 结果相关的问题。在此特定示例中,我们将产品 A 显示在产品 B 上方。产品 A 的标题包含搜索词。产品 B 的标题还包含搜索词及其描述和类别名称。所以从逻辑上讲,产品 B 应该更相关并出现在产品 A 之上,但事实并非如此。
架构配置为考虑所有这些额外字段。在分析了带有 ...&debugQuery=true&debug.explain.structured=true
的查询的调试信息后,两个产品似乎都达到了相同的分数。进一步看,我可以看到这些额外的字段已经计算了分数,但由于某种原因,解析器只取这些分数的最大值而不是导致它们相同的总和:
Solr 这样做有什么原因吗?有什么办法可以改变这种行为来使用总和而不是最大值? (就像图像中的 parent 元素一样)
您可以使用 tie
参数控制分数的计算方式,前提是您使用的是 Dismax/eDismax 查询解析器。
Solr 文档解释得很好:
The tie parameter specifies a float value (which should be something
much less than 1) to use as tiebreaker in DisMax queries.
When a term from the user’s input is tested against multiple fields,
more than one field may match. If so, each field will generate a
different score based on how common that word is in that field (for
each document relative to all other documents).
The tie parameter lets
you control how much the final score of the query will be influenced
by the scores of the lower scoring fields compared to the highest
scoring field.
A value of "0.0" - the default - makes the query a pure "disjunction
max query": that is, only the maximum scoring subquery contributes to
the final score.
A value of "1.0" makes the query a pure "disjunction
sum query" where it doesn’t matter what the maximum scoring sub query
is, because the final score will be the sum of the subquery scores.
Typically a low value, such as 0.1, is useful.
我们遇到了一些与 Solr 结果相关的问题。在此特定示例中,我们将产品 A 显示在产品 B 上方。产品 A 的标题包含搜索词。产品 B 的标题还包含搜索词及其描述和类别名称。所以从逻辑上讲,产品 B 应该更相关并出现在产品 A 之上,但事实并非如此。
架构配置为考虑所有这些额外字段。在分析了带有 ...&debugQuery=true&debug.explain.structured=true
的查询的调试信息后,两个产品似乎都达到了相同的分数。进一步看,我可以看到这些额外的字段已经计算了分数,但由于某种原因,解析器只取这些分数的最大值而不是导致它们相同的总和:
Solr 这样做有什么原因吗?有什么办法可以改变这种行为来使用总和而不是最大值? (就像图像中的 parent 元素一样)
您可以使用 tie
参数控制分数的计算方式,前提是您使用的是 Dismax/eDismax 查询解析器。
Solr 文档解释得很好:
The tie parameter specifies a float value (which should be something much less than 1) to use as tiebreaker in DisMax queries.
When a term from the user’s input is tested against multiple fields, more than one field may match. If so, each field will generate a different score based on how common that word is in that field (for each document relative to all other documents).
The tie parameter lets you control how much the final score of the query will be influenced by the scores of the lower scoring fields compared to the highest scoring field.
A value of "0.0" - the default - makes the query a pure "disjunction max query": that is, only the maximum scoring subquery contributes to the final score.
A value of "1.0" makes the query a pure "disjunction sum query" where it doesn’t matter what the maximum scoring sub query is, because the final score will be the sum of the subquery scores. Typically a low value, such as 0.1, is useful.