如何将所有文档包含在 ElasticSearch 聚合中并避免 sum_other_doc_count > 0
How to include all docs in ElasticSearch Aggregation and avoid sum_other_doc_count > 0
ES 不是我工作的主流,有一种行为我无法纠正。我有一个相当简单的聚合查询:
GET /my_index/_search
{
"size": 0,
"query": {
"bool": {
"must": [
{
"match": {
"request_type": "some_type"
}
},
{
"match": {
"carrier_name.keyword": "some_carrier"
}
}
]
}
},
"aggs": {
"by_date": {
"terms": {
"field": "date",
"order": {
"_term": "asc"
}
},
"aggs": {
"carrier_total": {
"sum": {
"field": "total_count"
}
}
}
}
}
}
我对https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html的理解是,并非所有文档都包含在聚合中。事实上,根据查询部分,我确实在结果中看到 "sum_other_doc_count" : 值大于零。
我的问题:有没有一种方法可以构建搜索以包含所有文档?文档数量比较少,一般在1k以下,
提前致谢,
鲁文
将术语 agg 的 size
从默认 10
增加到一个较大的数字:
...
"by_date": {
"terms": {
"field": "date",
"order": {
"_term": "asc"
},
"size": 1000 <-----
}
...
根据documentaion,
size
defaults to 10
from
+ size
can not be more than the index.max_result_window
index
setting, which defaults to 10,000.
在您的情况下,文档很小,将近 1k,因此可以轻松检索 1k 个结果。
The size parameter can be set to define how many term buckets should
be returned out of the overall terms list. By default, the node
coordinating the search process will request each shard to provide its
own top size term buckets and once all shards respond, it will reduce
the results to the final list that will then be returned to the
client.
因此请求在日期字段中包含前 1000 个文档。
...
"by_date": {
"terms": {
"field": "date",
"order": {
"_term": "asc"
},
"size": 1000
}
...
请求的大小越大,结果越准确,但计算最终结果的成本也越高
想了解更多,可以参考这个official doc
ES 不是我工作的主流,有一种行为我无法纠正。我有一个相当简单的聚合查询:
GET /my_index/_search
{
"size": 0,
"query": {
"bool": {
"must": [
{
"match": {
"request_type": "some_type"
}
},
{
"match": {
"carrier_name.keyword": "some_carrier"
}
}
]
}
},
"aggs": {
"by_date": {
"terms": {
"field": "date",
"order": {
"_term": "asc"
}
},
"aggs": {
"carrier_total": {
"sum": {
"field": "total_count"
}
}
}
}
}
}
我对https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html的理解是,并非所有文档都包含在聚合中。事实上,根据查询部分,我确实在结果中看到 "sum_other_doc_count" : 值大于零。
我的问题:有没有一种方法可以构建搜索以包含所有文档?文档数量比较少,一般在1k以下,
提前致谢, 鲁文
将术语 agg 的 size
从默认 10
增加到一个较大的数字:
...
"by_date": {
"terms": {
"field": "date",
"order": {
"_term": "asc"
},
"size": 1000 <-----
}
...
根据documentaion,
size
defaults to 10
from
+size
can not be more than theindex.max_result_window
index setting, which defaults to 10,000.
在您的情况下,文档很小,将近 1k,因此可以轻松检索 1k 个结果。
The size parameter can be set to define how many term buckets should be returned out of the overall terms list. By default, the node coordinating the search process will request each shard to provide its own top size term buckets and once all shards respond, it will reduce the results to the final list that will then be returned to the client.
因此请求在日期字段中包含前 1000 个文档。
...
"by_date": {
"terms": {
"field": "date",
"order": {
"_term": "asc"
},
"size": 1000
}
...
请求的大小越大,结果越准确,但计算最终结果的成本也越高
想了解更多,可以参考这个official doc