Elasticsearch:聚合已知的对象键(不是值)

Elasticsearch: Aggregate known object keys (not values)

我的 Elasticsearch 有一个包含如下文档的索引:

[{
  "_index": "products",
  "_type": "product",
  "_id": "100",
  "_score": 1,
  "_source": {
    "id": "100",
    "name": "Product 1",
    "catalogue": {
      "categories": {
        "cat1": ['h1', 'spin2'],
        "cat5": ['h2', 'spin2']
      }
    }
  }
},
{
  "_index": "products",
  "_type": "product",
  "_id": "100",
  "_score": 1,
  "_source": {
    "id": "100",
    "name": "Product 1",
    "catalogue": {
      "categories": {
        "cat2": ['d1', 'spin2'],
        "cat5": ['h2', 'spin2']
      }
    }
  }
}]

我需要汇总 known categories。以上的预期结果是:

"aggregations": {
  "categories": {
    "doc_count_error_upper_bound": 0,
    "sum_other_doc_count": 0,
    "buckets": [
      {
        "key": "cat1",
        "doc_count": 1
      },
      {
        "key": "cat2",
        "doc_count": 1
      },
      {
        "key": "cat5",
        "doc_count": 2
      },
    ]
  }
}

我应该如何定义搜索调用?

GET _search
{
  "aggregations": {
    "categories": {
      "terms": {
        ???
      }
    }
  }
}

更新: 我应该像下面那样使用 script 键。这可能会对性能产生影响,对吧?

GET _search
{
  "aggregations": {
    "categories": {
      "terms": {
        "script" : "????"
      }
    }
  }

你可以这样做

GET /products/product/_search?search_type=count
{
  "aggs": {
    "cats": {
      "terms": {
        "script": "categories=_source.catalogue.categories;terms=[];for(categ in categories.keySet())terms+=categ;return terms"
      }
    }
  }
}

但是,是的,它会对性能产生影响。您需要对此进行测试并查看其行为方式。确保多次 运行 相同的查询,因为第一次 return 可能需要更长的时间,这是正常的。