Elasticsearch：随机字段的聚合

Question

enter image description here

现在我有一个像图片这样的文档了。本文档的结构是“内容”字段，其中包含许多随机键字段（请注意 keys.They 没有固定格式，可能就像 UUID 一样）。我想用 ES 查询为“内容”中的所有键找到 start_time 的最大值。我能为此做些什么？文档：

{"contents": {
    "key1": {
        "start_time": "2020-08-01T00:00:19.500Z",
        "last_event_published_time": "2020-08-01T23:59:03.738Z",
        "last_event_timestamp": "2020-08-01T23:59:03.737Z",
        "size": 1590513,
        "read_offset": 1590513,
        "name": "key1_name"
    },
    "key2": {
        "start_time": "2020-08-01T00:00:19.500Z",
        "last_event_published_time": "2020-08-01T23:59:03.738Z",
        "last_event_timestamp": "2020-08-01T23:59:03.737Z",
        "size": 1590513,
        "read_offset": 1590513,
        "name": "key2_name"
    }
}}

我试过 Joe 的解决方案并且有效。但是当我像这样修改文档时：

{
"timestamp": "2020-08-01T23:59:59.359Z",
"type": "beats_stats",
"beats_stats": {
    "metrics": {
        "filebeat": {
            "harvester": {
                "files": {
                    "d47f60db-ac59-4b51-a928-0772a815438a": {
                        "start_time": "2020-08-01T00:00:18.320Z",
                        "last_event_published_time": "2020-08-01T23:59:03.738Z",
                        "last_event_timestamp": "2020-08-01T23:59:03.737Z",
                        "size": 1590513,
                        "read_offset": 1590513,
                        "name": "/data/logs/galogs/ga_log_2020-08-01.log"
                    },
                    "e47f60db-ac59-4b51-a928-0772a815438a": {
                        "start_time": "2020-08-01T00:00:19.500Z",
                        "last_event_published_time": "2020-08-01T23:59:03.738Z",
                        "last_event_timestamp": "2020-08-01T23:59:03.737Z",
                        "size": 1590513,
                        "read_offset": 1590513,
                        "name": "/data/logs/galogs/ga_log_2020-08-01.log"
                    }
                }
            }
        }
    }
}}

出错了：

"error" : {
"root_cause" : [
  {
    "type" : "script_exception",
    "reason" : "runtime error",
    "script_stack" : [
      "for (def entry : params._source['beats_stats.metrics.filebeat.harvester.files'].values()) {\n            ",
      "                                                                               ^---- HERE"
    ],
    "script" : "\n          for (def entry : params._source['beats_stats.metrics.filebeat.harvester.files'].values()) {\n            state.start_millis_arr.add(\n              Instant.parse(entry.start_time).toEpochMilli()\n            );\n          }\n        ",
    "lang" : "painless"
  }
],
"type" : "search_phase_execution_exception",
"reason" : "all shards failed",
"phase" : "query",
"grouped" : true,
"failed_shards" : [
  {
    "shard" : 0,
    "index" : "agg-test-index-1",
    "node" : "B4mXZVgrTe-MsAQKMVhHUQ",
    "reason" : {
      "type" : "script_exception",
      "reason" : "runtime error",
      "script_stack" : [
        "for (def entry : params._source['beats_stats.metrics.filebeat.harvester.files'].values()) {\n            ",
        "                                                                               ^---- HERE"
      ],
      "script" : "\n          for (def entry : params._source['beats_stats.metrics.filebeat.harvester.files'].values()) {\n            state.start_millis_arr.add(\n              Instant.parse(entry.start_time).toEpochMilli()\n            );\n          }\n        ",
      "lang" : "painless",
      "caused_by" : {
        "type" : "null_pointer_exception",
        "reason" : null
      }
    }
  }
]}

Answer 1

您可以使用 scripted_metric 来计算这些。这很繁琐，但肯定是可能的。

正在模仿您的索引并同步一些文档：

POST myindex/_doc
{"contents":{"randomKey1":{"start_time":"2020-08-06T11:01:00.515Z"}}}

POST myindex/_doc
{"contents":{"35431fsf31_s35dfas":{"start_time":"2021-08-06T11:01:00.515Z"}}}

POST myindex/_doc
{"contents":{"999bc_123":{"start_time":"2019-08-06T11:01:00.515Z"}}}

获取未知随机最大日期sub-objects:

GET myindex/_search
{
  "size": 0,
  "aggs": {
    "max_start_date": {
      "scripted_metric": {
        "init_script": "state.start_millis_arr = [];",
        "map_script": """
          for (def entry : params._source['contents'].values()) {
            state.start_millis_arr.add(
              Instant.parse(entry.start_time).toEpochMilli()
            );
          }
        """,
        "combine_script": """
          // sort in-place
          Collections.sort(state.start_millis_arr, Collections.reverseOrder());
          return DateTimeFormatter.ISO_INSTANT.format(
            Instant.ofEpochMilli(
              // first is now the highest
              state.start_millis_arr[0]
            )
          );

        """,
        "reduce_script": "return states"
      }
    }
  }
}

顺便说一句：@Sahil Gupta 的评论是正确的——永远不要在可以粘贴文本（且有帮助）的地方使用图像。

Elasticsearch：随机字段的聚合

Elasticsearch: Aggregation For Random Fields

elasticsearch

elasticsearch-aggregation

elasticsearch-scripting