从 Elasticsearch 获取不同嵌套对象的计数

Get count distinct by nested objects from Elasticsearch

我有以下映射的索引

{
    "mappings": {
        "properties": {
            "typed_obj": {
                "type": "nested",
                "properties": {
                    "id": {"type": "keyword"},
                    "type": {"type": "keyword"}
                }
            }
        }
    }
}

和文档

{"index" : {}}
{"typed_obj": [{"id": "1", "type": "one"}, {"id": "2", "type": "two"}]}
{"index" : {}}
{"typed_obj": [{"id": "1", "type": "one"}, {"id": "2", "type": "one"}]}
{"index" : {}}
{"typed_obj": [{"id": "1", "type": "one"}, {"id": "3", "type": "one"}]}
{"index" : {}}
{"typed_obj": [{"id": "1", "type": "one"}, {"id": "4", "type": "two"}]}

如何按类型对 typed_obj 进行分组并计算唯一 ID? 好像

{
 "type": "one",
 "count": 3
},
{
 "type": "two",
 "count": 2
}

我用 agg 组成查询

{
    "query": {
        "match_all": {}
    },
    "aggs": {
        "obj_nested": {
            "nested": {
                "path": "typed_obj"
            },
            "aggs": {
                "by_type_and_id": {
                    "multi_terms": {
                        "terms": [
                            {
                                "field": "typed_obj.type"
                            },
                            {
                                "field": "typed_obj.id"
                            }
                        ]
                    }
                }
            }
        }
    },
    "size": 0
}

它returns

"buckets": [
                    {
                        "key": [
                            "one",
                            "1"
                        ],
                        "key_as_string": "one|1",
                        "doc_count": 4
                    },
                    {
                        "key": [
                            "one",
                            "2"
                        ],
                        "key_as_string": "one|2",
                        "doc_count": 1
                    },
                    {
                        "key": [
                            "one",
                            "3"
                        ],
                        "key_as_string": "one|3",
                        "doc_count": 1
                    },
                    {
                        "key": [
                            "two",
                            "2"
                        ],
                        "key_as_string": "two|2",
                        "doc_count": 1
                    },
                    {
                        "key": [
                            "two",
                            "4"
                        ],
                        "key_as_string": "two|4",
                        "doc_count": 1
                    }
                ]

在后端应用程序中,我可以按第一个元素(它是 typed_obj 类型)对键进行分组,然后检索长度,但我的问题是 - 是否可以在不从索引中获取所有 id+type 的情况下获取类型计数对 ?

您需要使用 Cardinality aggregation 来计算不同的值。

查询:

{
  "query": {
    "match_all": {}
  },
  "aggs": {
    "obj_nested": {
      "nested": {
        "path": "typed_obj"
      },
      "aggs": {
        "type":{
          "terms": {
            "field": "typed_obj.type",
            "size": 10
          },
          "aggs": {
            "id": {
              "cardinality": {
                "field": "typed_obj.id"
              }
            }
          }
        }
      }
    }
  },
  "size": 0
}

回应

"aggregations" : {
    "obj_nested" : {
      "doc_count" : 8,
      "type" : {
        "doc_count_error_upper_bound" : 0,
        "sum_other_doc_count" : 0,
        "buckets" : [
          {
            "key" : "one",
            "doc_count" : 6,
            "id" : {
              "value" : 3
            }
          },
          {
            "key" : "two",
            "doc_count" : 2,
            "id" : {
              "value" : 2
            }
          }
        ]
      }
    }
  }

注:

A single-value metrics aggregation that calculates an approximate count of distinct values.