如何在 ElasticSearch 中排序和限制聚合
how to sort and limit aggregations in ElasticSearch
例如,我有以下列的记录:(Country,City,Date,Income)
USA SF 2015-08 50
USA SF 2015-05 30
USA SF 2015-01 20
USA NY 2015-05 70
USA NY 2015-02 10
U.K LD 2015-05 90
我的 sql 为:select country,city,max(date) as maxDate,sum(income) as sumIncome from testTable group by country,city order by maxDate desc,sumIncome desc limit 3
。
所以结果应该是:
USA SF 2015-08 100
U.K LD 2015-05 90
USA NY 2015-05 80
我写的ES聚合如下,但是写错了:
"aggs":{"sub1": {"terms":{"field":"contry"},
"aggs":{"sub2":{"terms":{"field":"city",
"order":[{"submax":"DESC"},{"subsum":"DESC"}]},
"aggs":{"submax":{"max":{"field":"date"}},"subsum":{"sum":{"field":"income"}}}}}}}
通过我上面的脚本,得到的错误结果如下:
USA SF 2015-08 100
USA NY 2015-05 80
U.K LD 2015-05 90
既然我理解了要求,那么实际上您有两个选择。
选项 1
使用 script
到 "concatenate" country
字段和 city
字段。在 Elasticsearch 中不可能使用每个字段的常规聚合来执行您想要的操作。
相反,你需要做这样的事情:
GET /test/test/_search?search_type=count
{
"aggs": {
"sub1": {
"terms": {
"script": "doc['country'].value + ' ' + doc['city'].value",
"size": 3,
"order": [
{
"submax": "DESC"
},
{
"subsum": "DESC"
}
]
},
"aggs": {
"submax": {
"max": {
"field": "date"
}
},
"subsum": {
"sum": {
"field": "income"
}
}
}
}
}
}
与curl
:
curl -XPOST "http://localhost:9200/livebox/type1/_search?search_type=count" -d'
{
"aggs": {
"sub1": {
"terms": {
"script": "doc[\"boxname\"].value + \" \" + doc[\"app\"].value",
"size": 3,
"order": [
{
"submax": "DESC"
},
{
"subsum": "DESC"
}
]
},
"aggs": {
"submax": {
"max": {
"field": "date"
}
},
"subsum": {
"sum": {
"field": "count"
}
}
}
}
}
}'
聚合结果将生成以下形式的术语:country
+ </code> + <code>city
.
"buckets": [
{
"key": "usa sf",
"doc_count": 3,
"subsum": {
"value": 100
},
"submax": {
"value": 1438387200000,
"value_as_string": "2015-08"
}
},
{
"key": "uk ld",
"doc_count": 1,
"subsum": {
"value": 90
},
"submax": {
"value": 1430438400000,
"value_as_string": "2015-05"
}
},
{
"key": "usa ny",
"doc_count": 2,
"subsum": {
"value": 80
},
"submax": {
"value": 1430438400000,
"value_as_string": "2015-05"
}
}
]
选项 2
使用 _source
transformation 将在 索引 时构建一个新字段,这将 "move" 运行 脚本的性能影响 运行聚合时间。
索引的映射,因为它需要一些改变,不管你现在有什么:
PUT /test
{
"mappings": {
"test": {
"transform": {
"script": "ctx._source['country_and_city'] = ctx._source['country'] + ' ' + ctx._source['city']"
},
"properties": {
"country": {
"type": "string"
},
"city": {
"type": "string"
},
"income": {
"type": "integer"
},
"date": {
"type": "date",
"format": "yyyy-MM"
},
"country_and_city": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
查询:
GET /test/test/_search?search_type=count
{
"aggs": {
"sub1": {
"terms": {
"field": "country_and_city",
"order": [
{
"submax": "DESC"
},
{
"subsum": "DESC"
}
]
},
"aggs": {
"submax": {
"max": {
"field": "date"
}
},
"subsum": {
"sum": {
"field": "income"
}
}
}
}
}
}
结果:
"buckets": [
{
"key": "usa sf",
"doc_count": 3,
"subsum": {
"value": 100
},
"submax": {
"value": 1438387200000,
"value_as_string": "2015-08"
}
},
{
"key": "uk ld",
"doc_count": 1,
"subsum": {
"value": 90
},
"submax": {
"value": 1430438400000,
"value_as_string": "2015-05"
}
},
{
"key": "usa ny",
"doc_count": 2,
"subsum": {
"value": 80
},
"submax": {
"value": 1430438400000,
"value_as_string": "2015-05"
}
}
]
例如,我有以下列的记录:(Country,City,Date,Income)
USA SF 2015-08 50
USA SF 2015-05 30
USA SF 2015-01 20
USA NY 2015-05 70
USA NY 2015-02 10
U.K LD 2015-05 90
我的 sql 为:select country,city,max(date) as maxDate,sum(income) as sumIncome from testTable group by country,city order by maxDate desc,sumIncome desc limit 3
。
所以结果应该是:
USA SF 2015-08 100
U.K LD 2015-05 90
USA NY 2015-05 80
我写的ES聚合如下,但是写错了:
"aggs":{"sub1": {"terms":{"field":"contry"},
"aggs":{"sub2":{"terms":{"field":"city",
"order":[{"submax":"DESC"},{"subsum":"DESC"}]},
"aggs":{"submax":{"max":{"field":"date"}},"subsum":{"sum":{"field":"income"}}}}}}}
通过我上面的脚本,得到的错误结果如下:
USA SF 2015-08 100
USA NY 2015-05 80
U.K LD 2015-05 90
既然我理解了要求,那么实际上您有两个选择。
选项 1
使用 script
到 "concatenate" country
字段和 city
字段。在 Elasticsearch 中不可能使用每个字段的常规聚合来执行您想要的操作。
相反,你需要做这样的事情:
GET /test/test/_search?search_type=count
{
"aggs": {
"sub1": {
"terms": {
"script": "doc['country'].value + ' ' + doc['city'].value",
"size": 3,
"order": [
{
"submax": "DESC"
},
{
"subsum": "DESC"
}
]
},
"aggs": {
"submax": {
"max": {
"field": "date"
}
},
"subsum": {
"sum": {
"field": "income"
}
}
}
}
}
}
与curl
:
curl -XPOST "http://localhost:9200/livebox/type1/_search?search_type=count" -d'
{
"aggs": {
"sub1": {
"terms": {
"script": "doc[\"boxname\"].value + \" \" + doc[\"app\"].value",
"size": 3,
"order": [
{
"submax": "DESC"
},
{
"subsum": "DESC"
}
]
},
"aggs": {
"submax": {
"max": {
"field": "date"
}
},
"subsum": {
"sum": {
"field": "count"
}
}
}
}
}
}'
聚合结果将生成以下形式的术语:country
+ </code> + <code>city
.
"buckets": [
{
"key": "usa sf",
"doc_count": 3,
"subsum": {
"value": 100
},
"submax": {
"value": 1438387200000,
"value_as_string": "2015-08"
}
},
{
"key": "uk ld",
"doc_count": 1,
"subsum": {
"value": 90
},
"submax": {
"value": 1430438400000,
"value_as_string": "2015-05"
}
},
{
"key": "usa ny",
"doc_count": 2,
"subsum": {
"value": 80
},
"submax": {
"value": 1430438400000,
"value_as_string": "2015-05"
}
}
]
选项 2
使用 _source
transformation 将在 索引 时构建一个新字段,这将 "move" 运行 脚本的性能影响 运行聚合时间。
索引的映射,因为它需要一些改变,不管你现在有什么:
PUT /test
{
"mappings": {
"test": {
"transform": {
"script": "ctx._source['country_and_city'] = ctx._source['country'] + ' ' + ctx._source['city']"
},
"properties": {
"country": {
"type": "string"
},
"city": {
"type": "string"
},
"income": {
"type": "integer"
},
"date": {
"type": "date",
"format": "yyyy-MM"
},
"country_and_city": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
查询:
GET /test/test/_search?search_type=count
{
"aggs": {
"sub1": {
"terms": {
"field": "country_and_city",
"order": [
{
"submax": "DESC"
},
{
"subsum": "DESC"
}
]
},
"aggs": {
"submax": {
"max": {
"field": "date"
}
},
"subsum": {
"sum": {
"field": "income"
}
}
}
}
}
}
结果:
"buckets": [
{
"key": "usa sf",
"doc_count": 3,
"subsum": {
"value": 100
},
"submax": {
"value": 1438387200000,
"value_as_string": "2015-08"
}
},
{
"key": "uk ld",
"doc_count": 1,
"subsum": {
"value": 90
},
"submax": {
"value": 1430438400000,
"value_as_string": "2015-05"
}
},
{
"key": "usa ny",
"doc_count": 2,
"subsum": {
"value": 80
},
"submax": {
"value": 1430438400000,
"value_as_string": "2015-05"
}
}
]