理解elasticsearch查询解释
Understand elasticsearch query explain
我试图理解弹性文档中的解释 API 评分:
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-explain.html
当我无法通过几个文档在自己的简单索引上计算出来时,我尝试在上面的文档页面上重现计算。
在示例中,它显示了 1.3862944 的 "value" 以及描述:"idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5))"。在 "details" 下,它为字段提供以下值:docFreq: 1.0, docCount: 5.0
使用提供的 docFreq 和 docCount 值,我将其计算为:log(1 + (5.0 - 1.0 + 0.5) / (1.0 + 0.5)) = 0.602 这与示例中的 1.3862944 不同.
我找不到任何匹配的值。
我是不是看错了?
下面是全文post
GET /twitter/_doc/0/_explain
{
"query" : {
"match" : { "message" : "elasticsearch" }
}
}
这将产生以下结果:
{
"_index": "twitter",
"_type": "_doc",
"_id": "0",
"matched": true,
"explanation": {
"value": 1.6943599,
"description": "weight(message:elasticsearch in 0) [PerFieldSimilarity], result of:",
"details": [
{
"value": 1.6943599,
"description": "score(doc=0,freq=1.0 = termFreq=1.0\n), product of:",
"details": [
{
"value": 1.3862944, <== This is the one I am trying
"description": "idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
"details": [
{
"value": 1.0,
"description": "docFreq",
"details": []
},
{
"value": 5.0,
"description": "docCount",
"details": []
}
]
},
{
"value": 1.2222223,
"description": "tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:",
"details": [
{
"value": 1.0,
"description": "termFreq=1.0",
"details": []
},
{
"value": 1.2,
"description": "parameter k1",
"details": []
},
{
"value": 0.75,
"description": "parameter b",
"details": []
},
{
"value": 5.4,
"description": "avgFieldLength",
"details": []
},
{
"value": 3.0,
"description": "fieldLength",
"details": []
}
]
}
]
}
]
}
}
一如既往的解释很准确,让我帮你理解那些计算:
这是初始公式:
log(1 + (5.0 - 1.0 + 0.5) / (1.0 + 0.5))
下一步是:
log(1 + 4.5 / 1.5)
还有一个:
log(4) = ?
棘手的部分来了。您将此 log
视为以 10 为底的对数。但是,如果您查看 Lucene 记分器的代码,您会发现它是一个 ln
,这正是 1.386294
部分代码:
public float idf(long docFreq, long numDocs) {
return (float)(Math.log(numDocs/(double)(docFreq+1)) + 1.0);
}
其中 Math.log 定义如下:
public static double log(double a)
Returns the natural logarithm (base e) of a double value.
我试图理解弹性文档中的解释 API 评分: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-explain.html
当我无法通过几个文档在自己的简单索引上计算出来时,我尝试在上面的文档页面上重现计算。
在示例中,它显示了 1.3862944 的 "value" 以及描述:"idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5))"。在 "details" 下,它为字段提供以下值:docFreq: 1.0, docCount: 5.0
使用提供的 docFreq 和 docCount 值,我将其计算为:log(1 + (5.0 - 1.0 + 0.5) / (1.0 + 0.5)) = 0.602 这与示例中的 1.3862944 不同.
我找不到任何匹配的值。
我是不是看错了?
下面是全文post
GET /twitter/_doc/0/_explain
{
"query" : {
"match" : { "message" : "elasticsearch" }
}
}
这将产生以下结果:
{
"_index": "twitter",
"_type": "_doc",
"_id": "0",
"matched": true,
"explanation": {
"value": 1.6943599,
"description": "weight(message:elasticsearch in 0) [PerFieldSimilarity], result of:",
"details": [
{
"value": 1.6943599,
"description": "score(doc=0,freq=1.0 = termFreq=1.0\n), product of:",
"details": [
{
"value": 1.3862944, <== This is the one I am trying
"description": "idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
"details": [
{
"value": 1.0,
"description": "docFreq",
"details": []
},
{
"value": 5.0,
"description": "docCount",
"details": []
}
]
},
{
"value": 1.2222223,
"description": "tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:",
"details": [
{
"value": 1.0,
"description": "termFreq=1.0",
"details": []
},
{
"value": 1.2,
"description": "parameter k1",
"details": []
},
{
"value": 0.75,
"description": "parameter b",
"details": []
},
{
"value": 5.4,
"description": "avgFieldLength",
"details": []
},
{
"value": 3.0,
"description": "fieldLength",
"details": []
}
]
}
]
}
]
}
}
一如既往的解释很准确,让我帮你理解那些计算:
这是初始公式:
log(1 + (5.0 - 1.0 + 0.5) / (1.0 + 0.5))
下一步是:
log(1 + 4.5 / 1.5)
还有一个:
log(4) = ?
棘手的部分来了。您将此 log
视为以 10 为底的对数。但是,如果您查看 Lucene 记分器的代码,您会发现它是一个 ln
,这正是 1.386294
部分代码:
public float idf(long docFreq, long numDocs) {
return (float)(Math.log(numDocs/(double)(docFreq+1)) + 1.0);
}
其中 Math.log 定义如下:
public static double log(double a)
Returns the natural logarithm (base e) of a double value.