弹性搜索分数计算
Elastic Search Score Calculations
我正在从 ES 1.7 迁移到 ES 6.5。数据源很常见,但是当我搜索任何特定关键字时,它 return 得到不同的分数并导致 return 选择不同的集合,因为选择了最高分数。
我在elastic中使用了'_explain'来了解查询分数计算的细节。我在两个索引中共享了对相同关键字的查询和解释。
使用的查询:
{
"explain": true,
"query": {
"function_score": {
"query": {
"match": {
"search": {
"query": "san"
}
}
},
"functions": [
{
"field_value_factor": {
"field": "related.score"
}
}
]
}
},
"from": 0,
"size": 1
}
ES 1.7 的映射
{
"_id": {
"path": "search"
},
"properties": {
"related": {
"properties": {
"category": {
"type": "long"
},
"score": {
"type": "double"
},
"search": {
"type": "string"
}
}
},
"search": {
"type": "string",
"analyzer": "english"
}
}
}
ES 1.7 查询说明:
{
"_explanation": {
"value": 4.83643,
"description": "function score, product of:",
"details": [
{
"value": 4.8384395,
"description": "weight(search:san in 11405) [PerFieldSimilarity], result of:",
"details": [
{
"value": 4.8384395,
"description": "fieldWeight in 11405, product of:",
"details": [
{
"value": 1,
"description": "tf(freq=1.0), with freq of:",
"details": [
{
"value": 1,
"description": "termFreq=1.0"
}
]
},
{
"value": 4.8384395,
"description": "idf(docFreq=1072, maxDocs=49844)"
},
{
"value": 1,
"description": "fieldNorm(doc=11405)"
}
]
}
]
},
{
"value": 0.99958473,
"description": "Math.min of",
"details": [
{
"value": 0.99958473,
"description": "field value function: (doc['related.score'].value * factor=1.0)"
},
{
"value": 3.4028235e+38,
"description": "maxBoost"
}
]
},
{
"value": 1,
"description": "queryBoost"
}
]
}
}
ES 6.5 的映射
{
“mappings”: {
“searches”: {
“properties”: {
“related”: {
“properties”: {
“category”: {
“type”: “long”
},
“score”: {
“type”: “double”
},
“search”: {
“type”: “text”
}
}
},
“search”: {
“type”: “text”,
“analyzer”: “english”
}
}
}
}
ES 6.5 查询说明:
{
"_explanation": {
"value": 5.1439505,
"description": "function score, product of:",
"details": [
{
"value": 5.1460876,
"description": "weight(search:san in 2464) [PerFieldSimilarity], result of:",
"details": [
{
"value": 5.1460876,
"description": "score(doc=2464,freq=1.0 = termFreq=1.0\n), product of:",
"details": [
{
"value": 3.82669,
"description": "idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
"details": [
{
"value": 5419,
"description": "docFreq",
"details": []
},
{
"value": 248810,
"description": "docCount",
"details": []
}
]
},
{
"value": 1.3447882,
"description": "tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:",
"details": [
{
"value": 1,
"description": "termFreq=1.0",
"details": []
},
{
"value": 1.2,
"description": "parameter k1",
"details": []
},
{
"value": 0.75,
"description": "parameter b",
"details": []
},
{
"value": 2.679008,
"description": "avgFieldLength",
"details": []
},
{
"value": 1,
"description": "fieldLength",
"details": []
}
]
}
]
}
]
},
{
"value": 0.99958473,
"description": "min of:",
"details": [
{
"value": 0.99958473,
"description": "field value function: none(doc['related.score'].value * factor=1.0)",
"details": []
},
{
"value": 3.4028235e+38,
"description": "maxBoost",
"details": []
}
]
}
]
}
}
如果我们同时查看两个版本的 ES 中解释分数计算的变化是不同的,从而导致不同的分数。 size=1 在查询中因此它应该 return 记录最大分数但是随着分数计算方法改变它 return ES 1.7 中相同关键字的不同分数和ES 6.5 导致最高分不同的关键字。
谁能帮我看看我们怎样才能得到相同的分数?
我正在从 ES 1.7 迁移到 ES 6.5。数据源很常见,但是当我搜索任何特定关键字时,它 return 得到不同的分数并导致 return 选择不同的集合,因为选择了最高分数。
我在elastic中使用了'_explain'来了解查询分数计算的细节。我在两个索引中共享了对相同关键字的查询和解释。
使用的查询:
{
"explain": true,
"query": {
"function_score": {
"query": {
"match": {
"search": {
"query": "san"
}
}
},
"functions": [
{
"field_value_factor": {
"field": "related.score"
}
}
]
}
},
"from": 0,
"size": 1
}
ES 1.7 的映射
{
"_id": {
"path": "search"
},
"properties": {
"related": {
"properties": {
"category": {
"type": "long"
},
"score": {
"type": "double"
},
"search": {
"type": "string"
}
}
},
"search": {
"type": "string",
"analyzer": "english"
}
}
}
ES 1.7 查询说明:
{
"_explanation": {
"value": 4.83643,
"description": "function score, product of:",
"details": [
{
"value": 4.8384395,
"description": "weight(search:san in 11405) [PerFieldSimilarity], result of:",
"details": [
{
"value": 4.8384395,
"description": "fieldWeight in 11405, product of:",
"details": [
{
"value": 1,
"description": "tf(freq=1.0), with freq of:",
"details": [
{
"value": 1,
"description": "termFreq=1.0"
}
]
},
{
"value": 4.8384395,
"description": "idf(docFreq=1072, maxDocs=49844)"
},
{
"value": 1,
"description": "fieldNorm(doc=11405)"
}
]
}
]
},
{
"value": 0.99958473,
"description": "Math.min of",
"details": [
{
"value": 0.99958473,
"description": "field value function: (doc['related.score'].value * factor=1.0)"
},
{
"value": 3.4028235e+38,
"description": "maxBoost"
}
]
},
{
"value": 1,
"description": "queryBoost"
}
]
}
}
ES 6.5 的映射
{
“mappings”: {
“searches”: {
“properties”: {
“related”: {
“properties”: {
“category”: {
“type”: “long”
},
“score”: {
“type”: “double”
},
“search”: {
“type”: “text”
}
}
},
“search”: {
“type”: “text”,
“analyzer”: “english”
}
}
}
}
ES 6.5 查询说明:
{
"_explanation": {
"value": 5.1439505,
"description": "function score, product of:",
"details": [
{
"value": 5.1460876,
"description": "weight(search:san in 2464) [PerFieldSimilarity], result of:",
"details": [
{
"value": 5.1460876,
"description": "score(doc=2464,freq=1.0 = termFreq=1.0\n), product of:",
"details": [
{
"value": 3.82669,
"description": "idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
"details": [
{
"value": 5419,
"description": "docFreq",
"details": []
},
{
"value": 248810,
"description": "docCount",
"details": []
}
]
},
{
"value": 1.3447882,
"description": "tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:",
"details": [
{
"value": 1,
"description": "termFreq=1.0",
"details": []
},
{
"value": 1.2,
"description": "parameter k1",
"details": []
},
{
"value": 0.75,
"description": "parameter b",
"details": []
},
{
"value": 2.679008,
"description": "avgFieldLength",
"details": []
},
{
"value": 1,
"description": "fieldLength",
"details": []
}
]
}
]
}
]
},
{
"value": 0.99958473,
"description": "min of:",
"details": [
{
"value": 0.99958473,
"description": "field value function: none(doc['related.score'].value * factor=1.0)",
"details": []
},
{
"value": 3.4028235e+38,
"description": "maxBoost",
"details": []
}
]
}
]
}
}
如果我们同时查看两个版本的 ES 中解释分数计算的变化是不同的,从而导致不同的分数。 size=1 在查询中因此它应该 return 记录最大分数但是随着分数计算方法改变它 return ES 1.7 中相同关键字的不同分数和ES 6.5 导致最高分不同的关键字。
谁能帮我看看我们怎样才能得到相同的分数?