如何在elasticsearch中查询非重复计数分布
How to query distinct count distibution in elasticsearch
基数聚合查询计算不同值的近似计数。我们如何计算文档的基数分布?
例如假设我们有:
a,a,a,b,b,b,c,c,d,d,e
非重复计数分布为:
3: 2 # count of distint element that have 3 occurnes (a, b)
2: 2 # c, d
1: 1 # e
实际上你不能像这样进行聚合。
但是,使用 transform
API (https://www.elastic.co/guide/en/elasticsearch/reference/current/transform-examples.html) 您可以创建一个新索引来进行简单的 terms
聚合:
PUT _transform/so
{
"dest" : {
"index" : "my-so"
},
"source": {
"index": "my-index"
},
"pivot": {
"group_by": {
"country": {
"terms": {
"field": "letter"
}
}
},
"aggregations": {
"cardinality": {
"value_count": {
"field" : "letter"
}
}
}
}
}
这会给你:
[
{
"country" : "a",
"cardinality" : 22
},
{
"country" : "b",
"cardinality" : 4
},
{
"country" : "c",
"cardinality" : 5049
}...
然后,你可以使用简单的术语或直方图聚合:
GET /my-so/_search
{
"size" : 0,
"aggs": {
"cc": {
"terms": {
"field": "cardinality"
}
}
}
}
基数聚合查询计算不同值的近似计数。我们如何计算文档的基数分布?
例如假设我们有:
a,a,a,b,b,b,c,c,d,d,e
非重复计数分布为:
3: 2 # count of distint element that have 3 occurnes (a, b)
2: 2 # c, d
1: 1 # e
实际上你不能像这样进行聚合。
但是,使用 transform
API (https://www.elastic.co/guide/en/elasticsearch/reference/current/transform-examples.html) 您可以创建一个新索引来进行简单的 terms
聚合:
PUT _transform/so
{
"dest" : {
"index" : "my-so"
},
"source": {
"index": "my-index"
},
"pivot": {
"group_by": {
"country": {
"terms": {
"field": "letter"
}
}
},
"aggregations": {
"cardinality": {
"value_count": {
"field" : "letter"
}
}
}
}
}
这会给你:
[
{
"country" : "a",
"cardinality" : 22
},
{
"country" : "b",
"cardinality" : 4
},
{
"country" : "c",
"cardinality" : 5049
}...
然后,你可以使用简单的术语或直方图聚合:
GET /my-so/_search
{
"size" : 0,
"aggs": {
"cc": {
"terms": {
"field": "cardinality"
}
}
}
}