如何在 Elasticsearch 中对不同的记录进行分组

How to group distinct records in Elasticsearch

我的 Elasticsearch 索引中有以下数据:

{
  "title": "Hello from elastic",
  "name": "ABC",
  "j_id": "1",
  "date": '2021-03-02T12:29:31.356514'
},
{
  "title": "Hello from elastic",
  "name": "PQR",
  "j_id": "1",
  "date": '2021-03-02T12:29:31.356514'
},
{
  "title": "Hello from elastic",
  "name": "XYZ",
  "j_id": "2",
  "date": '2021-03-02T12:29:31.356514'
},
{
  "title": "Hello from elastic",
  "name": "MNO",
  "j_id": "3",
  "date": '2021-03-02T12:29:31.356514'
}

现在想在id的基础上得到唯一记录。

预期输出为:

{
    "1": [{
      "title": "Hello from elastic",
      "name": "ABC",
      "j_id": "1",
      "date": '2021-03-02T12:29:31.356514'
    },
    {
      "title": "Hello from elastic",
      "name": "PQR",
      "j_id": "1",
      "date": '2021-03-02T12:29:31.356514'
    }],
    "2": [{
      "title": "Hello from elastic",
      "name": "XYZ",
      "j_id": "2",
      "date": '2021-03-02T12:29:31.356514'
    }],
    "3": [{
      "title": "Hello from elastic",
      "name": "MNO",
      "j_id": "3",
      "date": '2021-03-02T12:29:31.356514'
    }]
  }

我尝试了聚合查询,但它只提供了计数。 另外,我想在回复中包含最新记录。

  1. 如何从 Elasticsearch 中获取按 id 分组的唯一记录?
  2. 我要先插入最新的数据

假设覆盖 datej_id 字段的最小映射:

PUT myindex
{
  "mappings": {
    "properties": {
      "j_id": {
        "type": "keyword"
      },
      "date": {
        "type": "date"
      }
    }
  }
}

您可以利用 terms aggregation whose sub-aggregation is an ordered top_hits aggregation:

POST myindex/_search?filter_path=aggregations.*.buckets.key,aggregations.*.buckets.sorted_hits.hits.hits._source
{
  "size": 0,
  "aggs": {
    "by_j_id": {
      "terms": {
        "field": "j_id",
        "size": 10,
        "order": {
          "max_date": "desc"
        }
      },
      "aggs": {
        "max_date": {
          "max": {
            "field": "date"
          }
        },
        "sorted_hits": {
          "top_hits": {
            "size": 10,
            "sort": [
              {
                "date": {
                  "order": "desc"
                }
              }
            ]
          }
        }
      }
    }
  }
}

URL 参数 filter_path 减少了响应主体以紧密模仿您所需的格式:

{
  "aggregations" : {
    "by_j_id" : {
      "buckets" : [
        {
          "key" : "1",        
          "sorted_hits" : {
            "hits" : {
              "hits" : [
                {
                  "_source" : {
                    "title" : "Hello from elastic",
                    "name" : "ABC",
                    "j_id" : "1",
                    "date" : "2021-03-02T12:29:31.356514"
                  }
                },
                {
                  "_source" : {
                    "title" : "Hello from elastic",
                    "name" : "PQR",
                    "j_id" : "1",
                    "date" : "2021-03-02T12:29:31.356514"
                  }
                }
              ]
            }
          }
        },
        {
          "key" : "2",
          "sorted_hits" : {
            "hits" : {
              "hits" : [
                {
                  "_source" : {
                    "title" : "Hello from elastic",
                    "name" : "XYZ",
                    "j_id" : "2",
                    "date" : "2021-03-02T12:29:31.356514"
                  }
                }
              ]
            }
          }
        },
        {
          "key" : "3",
          "sorted_hits" : {
            "hits" : {
              "hits" : [
                {
                  "_source" : {
                    "title" : "Hello from elastic",
                    "name" : "MNO",
                    "j_id" : "3",
                    "date" : "2021-03-02T12:29:31.356514"
                  }
                }
              ]
            }
          }
        }
      ]
    }
  }
}