Elasticsearch:随机字段的聚合
Elasticsearch: Aggregation For Random Fields
enter image description here
现在我有一个像图片这样的文档了。本文档的结构是“内容”字段,其中包含许多随机键字段(请注意 keys.They 没有固定格式,可能就像 UUID 一样)。我想用 ES 查询为“内容”中的所有键找到 start_time 的最大值。我能为此做些什么?
文档:
{"contents": {
"key1": {
"start_time": "2020-08-01T00:00:19.500Z",
"last_event_published_time": "2020-08-01T23:59:03.738Z",
"last_event_timestamp": "2020-08-01T23:59:03.737Z",
"size": 1590513,
"read_offset": 1590513,
"name": "key1_name"
},
"key2": {
"start_time": "2020-08-01T00:00:19.500Z",
"last_event_published_time": "2020-08-01T23:59:03.738Z",
"last_event_timestamp": "2020-08-01T23:59:03.737Z",
"size": 1590513,
"read_offset": 1590513,
"name": "key2_name"
}
}}
我试过 Joe 的解决方案并且有效。但是当我像这样修改文档时:
{
"timestamp": "2020-08-01T23:59:59.359Z",
"type": "beats_stats",
"beats_stats": {
"metrics": {
"filebeat": {
"harvester": {
"files": {
"d47f60db-ac59-4b51-a928-0772a815438a": {
"start_time": "2020-08-01T00:00:18.320Z",
"last_event_published_time": "2020-08-01T23:59:03.738Z",
"last_event_timestamp": "2020-08-01T23:59:03.737Z",
"size": 1590513,
"read_offset": 1590513,
"name": "/data/logs/galogs/ga_log_2020-08-01.log"
},
"e47f60db-ac59-4b51-a928-0772a815438a": {
"start_time": "2020-08-01T00:00:19.500Z",
"last_event_published_time": "2020-08-01T23:59:03.738Z",
"last_event_timestamp": "2020-08-01T23:59:03.737Z",
"size": 1590513,
"read_offset": 1590513,
"name": "/data/logs/galogs/ga_log_2020-08-01.log"
}
}
}
}
}
}}
出错了:
"error" : {
"root_cause" : [
{
"type" : "script_exception",
"reason" : "runtime error",
"script_stack" : [
"for (def entry : params._source['beats_stats.metrics.filebeat.harvester.files'].values()) {\n ",
" ^---- HERE"
],
"script" : "\n for (def entry : params._source['beats_stats.metrics.filebeat.harvester.files'].values()) {\n state.start_millis_arr.add(\n Instant.parse(entry.start_time).toEpochMilli()\n );\n }\n ",
"lang" : "painless"
}
],
"type" : "search_phase_execution_exception",
"reason" : "all shards failed",
"phase" : "query",
"grouped" : true,
"failed_shards" : [
{
"shard" : 0,
"index" : "agg-test-index-1",
"node" : "B4mXZVgrTe-MsAQKMVhHUQ",
"reason" : {
"type" : "script_exception",
"reason" : "runtime error",
"script_stack" : [
"for (def entry : params._source['beats_stats.metrics.filebeat.harvester.files'].values()) {\n ",
" ^---- HERE"
],
"script" : "\n for (def entry : params._source['beats_stats.metrics.filebeat.harvester.files'].values()) {\n state.start_millis_arr.add(\n Instant.parse(entry.start_time).toEpochMilli()\n );\n }\n ",
"lang" : "painless",
"caused_by" : {
"type" : "null_pointer_exception",
"reason" : null
}
}
}
]}
您可以使用 scripted_metric
来计算这些。这很繁琐,但肯定是可能的。
正在模仿您的索引并同步一些文档:
POST myindex/_doc
{"contents":{"randomKey1":{"start_time":"2020-08-06T11:01:00.515Z"}}}
POST myindex/_doc
{"contents":{"35431fsf31_s35dfas":{"start_time":"2021-08-06T11:01:00.515Z"}}}
POST myindex/_doc
{"contents":{"999bc_123":{"start_time":"2019-08-06T11:01:00.515Z"}}}
获取未知随机最大日期sub-objects:
GET myindex/_search
{
"size": 0,
"aggs": {
"max_start_date": {
"scripted_metric": {
"init_script": "state.start_millis_arr = [];",
"map_script": """
for (def entry : params._source['contents'].values()) {
state.start_millis_arr.add(
Instant.parse(entry.start_time).toEpochMilli()
);
}
""",
"combine_script": """
// sort in-place
Collections.sort(state.start_millis_arr, Collections.reverseOrder());
return DateTimeFormatter.ISO_INSTANT.format(
Instant.ofEpochMilli(
// first is now the highest
state.start_millis_arr[0]
)
);
""",
"reduce_script": "return states"
}
}
}
}
顺便说一句:@Sahil Gupta 的评论是正确的——永远不要在可以粘贴文本(且有帮助)的地方使用图像。
enter image description here
现在我有一个像图片这样的文档了。本文档的结构是“内容”字段,其中包含许多随机键字段(请注意 keys.They 没有固定格式,可能就像 UUID 一样)。我想用 ES 查询为“内容”中的所有键找到 start_time 的最大值。我能为此做些什么? 文档:
{"contents": {
"key1": {
"start_time": "2020-08-01T00:00:19.500Z",
"last_event_published_time": "2020-08-01T23:59:03.738Z",
"last_event_timestamp": "2020-08-01T23:59:03.737Z",
"size": 1590513,
"read_offset": 1590513,
"name": "key1_name"
},
"key2": {
"start_time": "2020-08-01T00:00:19.500Z",
"last_event_published_time": "2020-08-01T23:59:03.738Z",
"last_event_timestamp": "2020-08-01T23:59:03.737Z",
"size": 1590513,
"read_offset": 1590513,
"name": "key2_name"
}
}}
我试过 Joe 的解决方案并且有效。但是当我像这样修改文档时:
{
"timestamp": "2020-08-01T23:59:59.359Z",
"type": "beats_stats",
"beats_stats": {
"metrics": {
"filebeat": {
"harvester": {
"files": {
"d47f60db-ac59-4b51-a928-0772a815438a": {
"start_time": "2020-08-01T00:00:18.320Z",
"last_event_published_time": "2020-08-01T23:59:03.738Z",
"last_event_timestamp": "2020-08-01T23:59:03.737Z",
"size": 1590513,
"read_offset": 1590513,
"name": "/data/logs/galogs/ga_log_2020-08-01.log"
},
"e47f60db-ac59-4b51-a928-0772a815438a": {
"start_time": "2020-08-01T00:00:19.500Z",
"last_event_published_time": "2020-08-01T23:59:03.738Z",
"last_event_timestamp": "2020-08-01T23:59:03.737Z",
"size": 1590513,
"read_offset": 1590513,
"name": "/data/logs/galogs/ga_log_2020-08-01.log"
}
}
}
}
}
}}
出错了:
"error" : {
"root_cause" : [
{
"type" : "script_exception",
"reason" : "runtime error",
"script_stack" : [
"for (def entry : params._source['beats_stats.metrics.filebeat.harvester.files'].values()) {\n ",
" ^---- HERE"
],
"script" : "\n for (def entry : params._source['beats_stats.metrics.filebeat.harvester.files'].values()) {\n state.start_millis_arr.add(\n Instant.parse(entry.start_time).toEpochMilli()\n );\n }\n ",
"lang" : "painless"
}
],
"type" : "search_phase_execution_exception",
"reason" : "all shards failed",
"phase" : "query",
"grouped" : true,
"failed_shards" : [
{
"shard" : 0,
"index" : "agg-test-index-1",
"node" : "B4mXZVgrTe-MsAQKMVhHUQ",
"reason" : {
"type" : "script_exception",
"reason" : "runtime error",
"script_stack" : [
"for (def entry : params._source['beats_stats.metrics.filebeat.harvester.files'].values()) {\n ",
" ^---- HERE"
],
"script" : "\n for (def entry : params._source['beats_stats.metrics.filebeat.harvester.files'].values()) {\n state.start_millis_arr.add(\n Instant.parse(entry.start_time).toEpochMilli()\n );\n }\n ",
"lang" : "painless",
"caused_by" : {
"type" : "null_pointer_exception",
"reason" : null
}
}
}
]}
您可以使用 scripted_metric
来计算这些。这很繁琐,但肯定是可能的。
正在模仿您的索引并同步一些文档:
POST myindex/_doc
{"contents":{"randomKey1":{"start_time":"2020-08-06T11:01:00.515Z"}}}
POST myindex/_doc
{"contents":{"35431fsf31_s35dfas":{"start_time":"2021-08-06T11:01:00.515Z"}}}
POST myindex/_doc
{"contents":{"999bc_123":{"start_time":"2019-08-06T11:01:00.515Z"}}}
获取未知随机最大日期sub-objects:
GET myindex/_search
{
"size": 0,
"aggs": {
"max_start_date": {
"scripted_metric": {
"init_script": "state.start_millis_arr = [];",
"map_script": """
for (def entry : params._source['contents'].values()) {
state.start_millis_arr.add(
Instant.parse(entry.start_time).toEpochMilli()
);
}
""",
"combine_script": """
// sort in-place
Collections.sort(state.start_millis_arr, Collections.reverseOrder());
return DateTimeFormatter.ISO_INSTANT.format(
Instant.ofEpochMilli(
// first is now the highest
state.start_millis_arr[0]
)
);
""",
"reduce_script": "return states"
}
}
}
}
顺便说一句:@Sahil Gupta 的评论是正确的——永远不要在可以粘贴文本(且有帮助)的地方使用图像。