使用 Solr,"add boosts" 而不是使用 "max" 提升的正确方法是什么
Using Solr, what is the correct way to "add boosts" instead of using the "max" boost
使用调试查询功能并查看 "explain" 部分,我意识到我一直在使用的提升: 使用 "max of" 基于结果的比较查询与每个字段匹配。在我的系统中,我有 10 个字段,它们根据某些值进行提升。然后我按分数降序对结果进行排序,但我认为这个分数将基于它匹配的任何字段获得的分数(总计)。我没有意识到分数被设置为它为任何提升字段计算的最大分数。如果我想优先考虑匹配我所有 10 个字段的结果,并且总分(例如 500)高于仅匹配我的 1 个字段(例如 100)的结果,我是不太确定我会如何处理。
示例说明:
320.3237 = sum of:
0.0069028055 = weight(custom_app:test in 7918) [SchemaSimilarity], result of:
0.0069028055 = score(doc=7918,freq=1.0 = termFreq=1.0
), product of:
0.006641347 = idf(docFreq=48698, docCount=49022)
1.0393683 = tfNorm, computed from:
1.0 = termFreq=1.0
1.2 = parameter k1
0.75 = parameter b
1.1020359 = avgFieldLength
1.0 = fieldLength
320.3168 = max of:
73.23891 = weight(name_autocomplete:james in 7918) [SchemaSimilarity], result of:
73.23891 = score(doc=7918,freq=1.0 = termFreq=1.0
), product of:
6.066 = boost
7.8911004 = idf(docFreq=32, docCount=86884)
1.5300368 = tfNorm, computed from:
1.0 = termFreq=1.0
1.2 = parameter k1
0.75 = parameter b
6.527704 = avgFieldLength
1.0 = fieldLength
51.871056 = weight(name_partial_match:colin in 7918) [SchemaSimilarity], result of:
51.871056 = score(doc=7918,freq=1.0 = termFreq=1.0
), product of:
4.05 = boost
7.8603234 = idf(docFreq=33, docCount=86843)
1.6294072 = tfNorm, computed from:
1.0 = termFreq=1.0
1.2 = parameter k1
0.75 = parameter b
17.933905 = avgFieldLength
1.0 = fieldLength
9.736896 = weight(custom_name_phonetic_en:KLN in 7918) [SchemaSimilarity], result of:
9.736896 = score(doc=7918,freq=1.0 = termFreq=1.0
), product of:
1.6875 = boost
5.4820786 = idf(docFreq=361, docCount=86884)
1.0525228 = tfNorm, computed from:
1.0 = termFreq=1.0
1.2 = parameter k1
0.75 = parameter b
2.9156578 = avgFieldLength
2.56 = fieldLength
61.69854 = weight(custom_display_name_partial_match:colin in 7918) [SchemaSimilarity], result of:
61.69854 = score(doc=7918,freq=1.0 = termFreq=1.0
), product of:
5.0625 = boost
7.532877 = idf(docFreq=46, docCount=86883)
1.61789 = tfNorm, computed from:
1.0 = termFreq=1.0
1.2 = parameter k1
0.75 = parameter b
38.531185 = avgFieldLength
2.56 = fieldLength
86.66015 = weight(custom_name_autocomplete:colin in 7918) [SchemaSimilarity], result of:
86.66015 = score(doc=7918,freq=1.0 = termFreq=1.0
), product of:
7.5825 = boost
7.6228366 = idf(docFreq=42, docCount=86884)
1.4993064 = tfNorm, computed from:
1.0 = termFreq=1.0
1.2 = parameter k1
0.75 = parameter b
13.767955 = avgFieldLength
2.56 = fieldLength
9.267912 = weight(name_phonetic_en:KLN in 7918) [SchemaSimilarity], result of:
9.267912 = score(doc=7918,freq=1.0 = termFreq=1.0
), product of:
1.35 = boost
6.1070633 = idf(docFreq=193, docCount=86884)
1.1241279 = tfNorm, computed from:
1.0 = termFreq=1.0
1.2 = parameter k1
0.75 = parameter b
1.3697113 = avgFieldLength
1.0 = fieldLength
320.3168 = weight(name_lowercase:colin in 7918) [SchemaSimilarity], result of:
320.3168 = score(doc=7918,freq=1.0 = termFreq=1.0
), product of:
40.1 = boost
7.9879503 = idf(docFreq=29, docCount=86884)
1.0 = tfNorm, computed from:
1.0 = termFreq=1.0
1.2 = parameter k1
0.75 = parameter b
1.0 = avgFieldLength
1.0 = fieldLength
如果你想包括其他分数的部分 - 除了最大分数查询 - you can use the tie
parameter。
此参数告诉 Solr 有多少 other 字段的分数也产生了包含在最终分数中的命中。它通常是一个较低的值,例如 0.1
.
The tie parameter specifies a float value (which should be something much less than 1) to use as tiebreaker in DisMax queries.
When a term from the user’s input is tested against multiple fields, more than one field may match. If so, each field will generate a different score based on how common that word is in that field (for each document relative to all other documents). The tie parameter lets you control how much the final score of the query will be influenced by the scores of the lower scoring fields compared to the highest scoring field.
A value of "0.0" - the default - makes the query a pure "disjunction max query": that is, only the maximum scoring subquery contributes to the final score. A value of "1.0" makes the query a pure "disjunction sum query" where it doesn’t matter what the maximum scoring sub query is, because the final score will be the sum of the subquery scores. Typically a low value, such as 0.1, is useful.
使用调试查询功能并查看 "explain" 部分,我意识到我一直在使用的提升: 使用 "max of" 基于结果的比较查询与每个字段匹配。在我的系统中,我有 10 个字段,它们根据某些值进行提升。然后我按分数降序对结果进行排序,但我认为这个分数将基于它匹配的任何字段获得的分数(总计)。我没有意识到分数被设置为它为任何提升字段计算的最大分数。如果我想优先考虑匹配我所有 10 个字段的结果,并且总分(例如 500)高于仅匹配我的 1 个字段(例如 100)的结果,我是不太确定我会如何处理。
示例说明:
320.3237 = sum of:
0.0069028055 = weight(custom_app:test in 7918) [SchemaSimilarity], result of:
0.0069028055 = score(doc=7918,freq=1.0 = termFreq=1.0
), product of:
0.006641347 = idf(docFreq=48698, docCount=49022)
1.0393683 = tfNorm, computed from:
1.0 = termFreq=1.0
1.2 = parameter k1
0.75 = parameter b
1.1020359 = avgFieldLength
1.0 = fieldLength
320.3168 = max of:
73.23891 = weight(name_autocomplete:james in 7918) [SchemaSimilarity], result of:
73.23891 = score(doc=7918,freq=1.0 = termFreq=1.0
), product of:
6.066 = boost
7.8911004 = idf(docFreq=32, docCount=86884)
1.5300368 = tfNorm, computed from:
1.0 = termFreq=1.0
1.2 = parameter k1
0.75 = parameter b
6.527704 = avgFieldLength
1.0 = fieldLength
51.871056 = weight(name_partial_match:colin in 7918) [SchemaSimilarity], result of:
51.871056 = score(doc=7918,freq=1.0 = termFreq=1.0
), product of:
4.05 = boost
7.8603234 = idf(docFreq=33, docCount=86843)
1.6294072 = tfNorm, computed from:
1.0 = termFreq=1.0
1.2 = parameter k1
0.75 = parameter b
17.933905 = avgFieldLength
1.0 = fieldLength
9.736896 = weight(custom_name_phonetic_en:KLN in 7918) [SchemaSimilarity], result of:
9.736896 = score(doc=7918,freq=1.0 = termFreq=1.0
), product of:
1.6875 = boost
5.4820786 = idf(docFreq=361, docCount=86884)
1.0525228 = tfNorm, computed from:
1.0 = termFreq=1.0
1.2 = parameter k1
0.75 = parameter b
2.9156578 = avgFieldLength
2.56 = fieldLength
61.69854 = weight(custom_display_name_partial_match:colin in 7918) [SchemaSimilarity], result of:
61.69854 = score(doc=7918,freq=1.0 = termFreq=1.0
), product of:
5.0625 = boost
7.532877 = idf(docFreq=46, docCount=86883)
1.61789 = tfNorm, computed from:
1.0 = termFreq=1.0
1.2 = parameter k1
0.75 = parameter b
38.531185 = avgFieldLength
2.56 = fieldLength
86.66015 = weight(custom_name_autocomplete:colin in 7918) [SchemaSimilarity], result of:
86.66015 = score(doc=7918,freq=1.0 = termFreq=1.0
), product of:
7.5825 = boost
7.6228366 = idf(docFreq=42, docCount=86884)
1.4993064 = tfNorm, computed from:
1.0 = termFreq=1.0
1.2 = parameter k1
0.75 = parameter b
13.767955 = avgFieldLength
2.56 = fieldLength
9.267912 = weight(name_phonetic_en:KLN in 7918) [SchemaSimilarity], result of:
9.267912 = score(doc=7918,freq=1.0 = termFreq=1.0
), product of:
1.35 = boost
6.1070633 = idf(docFreq=193, docCount=86884)
1.1241279 = tfNorm, computed from:
1.0 = termFreq=1.0
1.2 = parameter k1
0.75 = parameter b
1.3697113 = avgFieldLength
1.0 = fieldLength
320.3168 = weight(name_lowercase:colin in 7918) [SchemaSimilarity], result of:
320.3168 = score(doc=7918,freq=1.0 = termFreq=1.0
), product of:
40.1 = boost
7.9879503 = idf(docFreq=29, docCount=86884)
1.0 = tfNorm, computed from:
1.0 = termFreq=1.0
1.2 = parameter k1
0.75 = parameter b
1.0 = avgFieldLength
1.0 = fieldLength
如果你想包括其他分数的部分 - 除了最大分数查询 - you can use the tie
parameter。
此参数告诉 Solr 有多少 other 字段的分数也产生了包含在最终分数中的命中。它通常是一个较低的值,例如 0.1
.
The tie parameter specifies a float value (which should be something much less than 1) to use as tiebreaker in DisMax queries.
When a term from the user’s input is tested against multiple fields, more than one field may match. If so, each field will generate a different score based on how common that word is in that field (for each document relative to all other documents). The tie parameter lets you control how much the final score of the query will be influenced by the scores of the lower scoring fields compared to the highest scoring field.
A value of "0.0" - the default - makes the query a pure "disjunction max query": that is, only the maximum scoring subquery contributes to the final score. A value of "1.0" makes the query a pure "disjunction sum query" where it doesn’t matter what the maximum scoring sub query is, because the final score will be the sum of the subquery scores. Typically a low value, such as 0.1, is useful.