ElasticSearch 通过具有可变嵌套的嵌套字段聚合(或超过特定的 json 字段)

ElasticSearch aggregating by a nested field with variable nesting (or over particular json field)

我有以下结构GET /index-*/_mapping

    "top_field" : {
      "properties" : {
        "dict_key1" : {
          "properties" : {
            "field1" : {...},
            "field2" : {...},
            "field3" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            },
            "field4" : {...}
          },
        "dict_key2" : {
          "properties" : {
            "field1" : {...},
            "field2" : {...},
            "field3" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            },
            "field4" : {...}
          },
        "dict_key3": ...
        }

换句话说,top_field存储一个json。

我想汇总 'field3.keyword',而不考虑 dict_key*。类似于 top_field.*.field3.keyword.

但是,无论是否嵌套,我都无法使用术语聚合来让它工作。我也尝试通过不同的 dict_key* 进行存储,这几乎一样好,但我也无法让它工作。

我该怎么做?

TL;DR 我前段时间遇到了同样的问题 (Terms aggregation with nested wildcard path),结果证明由于查找和路径访问器的执行方式,这不可能直接实现。


虽然有一个脚本解决方法:

{
  "size": 0,
  "aggs": {
    "terms_emulator": {
      "scripted_metric": {
        "init_script": "state.keyword_counts = [:]",
        "map_script": """
          def source = params._source['top_field'];
          for (def key : source.keySet()) {
            if (!source[key].containsKey('field3')) continue;
            
            def field3_kw = source[key]['field3'];
        
            if (state.keyword_counts.containsKey(field3_kw)) { 
              state.keyword_counts[field3_kw] += 1;
            } else {
              state.keyword_counts[field3_kw] = 1;
            }
          }
        """,
        "combine_script": "state",
        "reduce_script": "states[0]"
      }
    }
  }
}

产生类似

的东西
"aggregations" : {
  "terms_emulator" : {
    "value" : {
      "keyword_counts" : {
        "world" : 1,
        "hello" : 2
      }
    }
  }
}

虽然这工作得很好,但我不建议在生产中使用脚本。您宁愿重组您的数据,以便直接查找成为可能。例如:

{
  "top_field": {
    "entries": [
      {
        "group_name": "dict_key1",
        "key_value_pairs": {
          "field3": "hello"
        }
      },
      {
        "group_name": "dict_key2",
        "key_value_pairs": {
          "field3": "world"
        }
      }
    ]
  }
}

并使 entries 嵌套。甚至可能放弃 top_field 因为它看起来多余并直接从 entries.

开始