Elasticsearch top_hits聚合结果与doc_count不同
Elasticsearch top_hits aggregation result and doc_count are different
查询
GET /someindex/_search
{
"size": 0,
"query": {
"ids": {
"types": [],
"values": ["08a2","08a3","03a2","03a3","84a1"]
}
},
"aggregations": {
"498": {
"terms": {
"field": "holderInfo.raw",
"size": 50
},
"aggregations": {
"tops": {
"top_hits": {
"_source": {
"includes": ["uid"]
}
}
}
}
}
}
}
结果
{
...
"hits": {
"total": 5,
"max_score": 0,
"hits": []
},
"aggregations": {
"498": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "MATSUSHITA ELECTRIC INDUSTRIAL",
"doc_count": 5,
"tops": {
"hits": {
"total": 5,
"max_score": 1,
"hits": [
{
"_index": "someindex",
"_id": "03a3",
"_score": 1,
"_source": {
"uid": "03a3"
}
},
{
"_index": "someindex",
"_id": "08a2",
"_score": 1,
"_source": {
"uid": "08a2"
}
},
{
"_index": "someindex",
"_id": "84a1",
"_score": 1,
"_source": {
"uid": "84a1"
}
}
]
}
}
}
]
}
}
}
“08a2”、“08a3”、“03a2”、“03a3”和“84a1”在 holderInfo.raw 字段中显然都有 'MATSUSHITA ELECTRIC INDUSTRIAL'。
因此doc_count中有5种情况,但top_hits结果中只输出“03a3”、“08a2”、“84a1”,而“08a3”和“ 03a2" 省略.
查询
GET /someindex/_search
{
"size": 0,
"query": {
"ids": {
"types": [],
"values": ["08a2","08a3","03a2","03a3","84a1"]
}
},
"aggregations": {
"498": {
"terms": {
"script": {
"inline": "doc['holderInfo.raw'].value"
},
"size": 50
}
}
}
}
结果
{
...
"hits": {
"total": 5,
"max_score": 0,
"hits": []
},
"aggregations": {
"498": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "MATSUSHITA ELECTRIC INDUSTRIAL",
"doc_count": 3
}
]
}
}
}
另外,脚本聚合时省略了两种情况
我想知道为什么缺少一些 uid。
我处于必须使用 elasticsearch 2.2 版的情况。想知道是elasticsearch的bug出现在旧版本还是用户的错
谢谢!
默认情况下,top_hits
aggregation returns 前 3 个热门点击。你只需要增加大小参数:
GET /someindex/_search
{
"size": 0,
"query": {
"ids": {
"types": [],
"values": ["08a2","08a3","03a2","03a3","84a1"]
}
},
"aggregations": {
"498": {
"terms": {
"field": "holderInfo.raw",
"size": 50
},
"aggregations": {
"tops": {
"top_hits": {
"size": 5, <---- add this
"_source": {
"includes": ["uid"]
}
}
}
}
}
}
}
查询
GET /someindex/_search
{
"size": 0,
"query": {
"ids": {
"types": [],
"values": ["08a2","08a3","03a2","03a3","84a1"]
}
},
"aggregations": {
"498": {
"terms": {
"field": "holderInfo.raw",
"size": 50
},
"aggregations": {
"tops": {
"top_hits": {
"_source": {
"includes": ["uid"]
}
}
}
}
}
}
}
结果
{
...
"hits": {
"total": 5,
"max_score": 0,
"hits": []
},
"aggregations": {
"498": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "MATSUSHITA ELECTRIC INDUSTRIAL",
"doc_count": 5,
"tops": {
"hits": {
"total": 5,
"max_score": 1,
"hits": [
{
"_index": "someindex",
"_id": "03a3",
"_score": 1,
"_source": {
"uid": "03a3"
}
},
{
"_index": "someindex",
"_id": "08a2",
"_score": 1,
"_source": {
"uid": "08a2"
}
},
{
"_index": "someindex",
"_id": "84a1",
"_score": 1,
"_source": {
"uid": "84a1"
}
}
]
}
}
}
]
}
}
}
“08a2”、“08a3”、“03a2”、“03a3”和“84a1”在 holderInfo.raw 字段中显然都有 'MATSUSHITA ELECTRIC INDUSTRIAL'。
因此doc_count中有5种情况,但top_hits结果中只输出“03a3”、“08a2”、“84a1”,而“08a3”和“ 03a2" 省略.
查询
GET /someindex/_search
{
"size": 0,
"query": {
"ids": {
"types": [],
"values": ["08a2","08a3","03a2","03a3","84a1"]
}
},
"aggregations": {
"498": {
"terms": {
"script": {
"inline": "doc['holderInfo.raw'].value"
},
"size": 50
}
}
}
}
结果
{
...
"hits": {
"total": 5,
"max_score": 0,
"hits": []
},
"aggregations": {
"498": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "MATSUSHITA ELECTRIC INDUSTRIAL",
"doc_count": 3
}
]
}
}
}
另外,脚本聚合时省略了两种情况
我想知道为什么缺少一些 uid。
我处于必须使用 elasticsearch 2.2 版的情况。想知道是elasticsearch的bug出现在旧版本还是用户的错
谢谢!
默认情况下,top_hits
aggregation returns 前 3 个热门点击。你只需要增加大小参数:
GET /someindex/_search
{
"size": 0,
"query": {
"ids": {
"types": [],
"values": ["08a2","08a3","03a2","03a3","84a1"]
}
},
"aggregations": {
"498": {
"terms": {
"field": "holderInfo.raw",
"size": 50
},
"aggregations": {
"tops": {
"top_hits": {
"size": 5, <---- add this
"_source": {
"includes": ["uid"]
}
}
}
}
}
}
}