如何根据层级字段值(部分字段)创建子桶聚合

How to create sub-bucket aggregation on the basis of hierarchy field value (part of the field)

假设我们有一个具有如下层次结构字段的文档:

POST subbuckets/_doc
{
  "hierarchy": "this/is/some/hierarchy"
}

POST subbuckets/_doc
{
  "hierarchy": "this/is/some/hierarchy2"
}

POST subbuckets/_doc
{
  "hierarchy": "this/is/another/hierarchy1"
}

我想计算属于每个层级的文件数量 即

我们无法在 keyword 上应用分析器,因此为了解决此问题,我们将字段定义为 text 类型并在 text 字段上启用聚合并设置 "fielddata": true。请检查以下配置。

索引映射:

PUT index5
{
  "settings": {
    "analysis": {
      "analyzer": {
        "path-analyzer": {
          "tokenizer": "path-tokenizer"
        }
      },
      "tokenizer": {
        "path-tokenizer": {
          "type": "path_hierarchy",
          "delimiter": "/"
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "hierarchy": {
        "type": "text",
        "analyzer": "path-analyzer",
        "search_analyzer": "keyword",
        "fielddata": true
      }
    }
  }
}

索引文件:

POST index5/_doc
{
  "hierarchy": "this/is/some/hierarchy"
}

POST index5/_doc
{
  "hierarchy": "this/is/some/hierarchy2"
}

POST index5/_doc
{
  "hierarchy": "this/is/another/hierarchy1"
}

查询:

POST index5/_search
{
  "aggs": {
    "path": {
      "terms": {
        "field": "hierarchy"
      }
    }
  },
  "size": 0
}

响应:

{
 "aggregations" : {
    "path" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "this",
          "doc_count" : 3
        },
        {
          "key" : "this/is",
          "doc_count" : 3
        },
        {
          "key" : "this/is/some",
          "doc_count" : 2
        },
        {
          "key" : "this/is/another",
          "doc_count" : 1
        },
        {
          "key" : "this/is/another/hierarchy1",
          "doc_count" : 1
        },
        {
          "key" : "this/is/some/hierarchy",
          "doc_count" : 1
        },
        {
          "key" : "this/is/some/hierarchy2",
          "doc_count" : 1
        }
      ]
    }
  }
}