如何在 Elasticsearch 聚合前排序?

How to sort before aggregation in Elasticsearch?

我有一个结构如下的 Elasticsearch 索引

{
    "mappings": {
        "properties": {
            "content": {
                "type": "text",
                "fields":{
                    "keyword":{
                        "type":"keyword",
                        "ignore_above":20
                    }
                }
            },
            "result_nums":{
                "type":"integer"
            }
        }
    }
}

索引中的所有文档都是这样

{
  "content": "this",
  "result_nums": 40
},
{
  "content": "this",
  "result_nums": 40
},
{
  "content": "that",
  "result_nums": 40
},
{
  "content": "what",
  "result_nums": 50
},
{
  "content": "what",
  "result_nums": 50
},
{
  "content": "but",
  "result_nums": 100
},
{
  "content": "like",
  "result_nums": 20
}

我需要获取数据,按 result_nums DESC 排序并删除重复的“内容”。例如,我使用这样的查询来获取前两个数据

{
    "size": 0,
    "aggs": {
        "content": {
            "terms": {
                "field": "content.keyword",
                "size": 2
            },
            "aggs": {
                "res_nums": {
                    "avg": {
                        "field": "result_nums"
                    }
                },
                "res_sort": {
                    "bucket_sort": {
                        "sort": [
                            {
                                "res_nums": "desc"
                            }
                        ]
                    }
                }
            }
        }
    }
}

我期望得到的数据是

                {
                    "key": "but",
                    "doc_count": 1,
                    "res_nums": {
                        "value": 100.0
                    }
                },
                {
                    "key": "what",
                    "doc_count": 2,
                    "res_nums": {
                        "value": 50.0
                    }
                }

但我实际得到的是

                {
                    "key": "what",
                    "doc_count": 2,
                    "res_nums": {
                        "value": 50.0
                    }
                },
                {
                    "key": "this",
                    "doc_count": 2,
                    "res_nums": {
                        "value": 40.0
                    }
                }

所以我觉得es在聚合之前需要排序,因为现在聚合之后才会排序,所以得到的结果与预期不符。

我尝试在聚合之前使用 sort 但没有效果

{
"size": 0,
    "sort": [
        {
            "result_nums": "desc"
        }
    ],
    "aggs": {
    ...
    }
...
}

那么聚合前如何排序?

您需要使用 max aggregation along with term query 获取数据,按 result_nums DESC 排序并删除重复的“内容”

添加一个工作示例

搜索查询:

{
  "size": 0,
  "aggs": {
    "content": {
      "terms": {
        "field": "content.keyword",
        "order": {
          "max_num": "desc"
        },
        "size":2
      },
      "aggs": {
        "max_num": {
          "max": {
            "field": "result_nums"
          }
        }
      }
    }
  }
}

搜索结果:

"aggregations": {
    "content": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 4,
      "buckets": [
        {
          "key": "but",
          "doc_count": 1,
          "max_num": {
            "value": 100.0
          }
        },
        {
          "key": "what",
          "doc_count": 2,
          "max_num": {
            "value": 50.0
          }
        }
      ]
    }