是否可以在父聚合定义的字段上执行 elasticsearch 嵌套统计聚合?
Is it possible to perform elasticsearch nested stats aggregation on a field defined by the parent aggregation?
我正在尝试进行查询以生成绘图。我的数据索引如下所示:
"mappings": {
"mydata": {
"properties": {
"type": { "type": "string", "index": "not_analyzed" },
"stamp": { "type": "date", "format": "date_hour_minute_second_millis" },
"data": { "type": "object" }
}
}
根据类型,数据字段将包含不同的对象,例如
temperature_data = {
"type": "temperature",
"stamp": "2015-11-01T15:25:19.123",
"data": {"temperature": 23.4, "variance": 0.0}
}
humidity_data = {
"type": "humidity",
"stamp": "2015-11-01T15:26:21.063",
"data": {"humidity": 75.1, "variance": 0.0}
}
我正在尝试根据桶的类型汇总桶上的数据,然后执行日期直方图以获取每个读数(温度、湿度)的统计数据。我的问题是如何在 stats aggs 上设置字段,因为它随类型而变化(例如,对于 "type": "temperature"
,该字段是 data.temperature
):
query = {
"size": 0,
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{"range" : {
"stamp" : {
"gt" : start_stamp,
"lt" : end_stamp
}
}}
]
}
}
}
},
"aggs": {
"pathes": {
"terms": {
"field": "type"
},
"aggs": {
"points": {
"date_histogram": {
"field": "stamp",
"interval": interval
},
"aggs": {
"point_stats": {
"stats": {
"field": "data."+field???
}
}
}
}
}
}
}
}
* 更新 *
按照建议我添加了一个data-type.groovy文件到config/scripts/,该文件包含以下内容:
return doc['data.temperature'].value
Elasticsearch 能够编译脚本:
[2015-11-02 19:50:32,651][INFO ][script] [Atum] compiling script file [/home/user/elasticsearch-1.7.0/config/scripts/data-type.groovy]
我更新了查询以加载脚本文件:
query = {
"size": 0,
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{"range" : {
"stamp" : {
"gt" : start_stamp,
"lt" : end_stamp
}
}}
]
}
}
}
},
"aggs": {
"pathes": {
"terms": {
"field": "type"
},
"aggs": {
"points": {
"date_histogram": {
"field": "stamp",
"interval": interval
},
"aggs": {
"point_stats": {
"stats": {
"script": {"file": "data-type"}
}
}
}
}
}
}
}
}
当我 运行 查询时,我得到以下输出:
{u'status': 400, u'error': u'SearchPhaseExecutionException[Failed to execute phase [query], ... Parse Failure [Unexpected token START_OBJECT in [point_stats].]]; }]'}
数据库中只有温度数据,如果我更改 "script": {"file": "data-type"} 为 "field": "data.temperature" 查询有效.
一个选项是将 humidity
和 temperature
字段重命名为相同的名称,例如 value
,这样您就可以简单地在该字段上进行聚合,这很好。您已经知道它是什么类型的值,因为您从 type
字段中知道它。
"aggs": {
"pathes": {
"terms": {
"field": "type"
},
"aggs": {
"points": {
"date_histogram": {
"field": "stamp",
"interval": interval
},
"aggs": {
"point_stats": {
"stats": {
"field": "data.value"
}
}
}
}
}
}
}
第二个选项是使用 script
,但如果您要添加更多类型的数据(压力等),那会降低性能和可扩展性
"aggs": {
"pathes": {
"terms": {
"field": "type"
},
"aggs": {
"points": {
"date_histogram": {
"field": "stamp",
"interval": interval
},
"aggs": {
"point_stats": {
"stats": {
"script": "return doc.type.value == 'temperature' ? doc['data.temperature'].value : doc['data.humidity'].value"
}
}
}
}
}
}
}
请注意,对于第二个选项,您需要 enable dynamic scripting
我正在尝试进行查询以生成绘图。我的数据索引如下所示:
"mappings": {
"mydata": {
"properties": {
"type": { "type": "string", "index": "not_analyzed" },
"stamp": { "type": "date", "format": "date_hour_minute_second_millis" },
"data": { "type": "object" }
}
}
根据类型,数据字段将包含不同的对象,例如
temperature_data = {
"type": "temperature",
"stamp": "2015-11-01T15:25:19.123",
"data": {"temperature": 23.4, "variance": 0.0}
}
humidity_data = {
"type": "humidity",
"stamp": "2015-11-01T15:26:21.063",
"data": {"humidity": 75.1, "variance": 0.0}
}
我正在尝试根据桶的类型汇总桶上的数据,然后执行日期直方图以获取每个读数(温度、湿度)的统计数据。我的问题是如何在 stats aggs 上设置字段,因为它随类型而变化(例如,对于 "type": "temperature"
,该字段是 data.temperature
):
query = {
"size": 0,
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{"range" : {
"stamp" : {
"gt" : start_stamp,
"lt" : end_stamp
}
}}
]
}
}
}
},
"aggs": {
"pathes": {
"terms": {
"field": "type"
},
"aggs": {
"points": {
"date_histogram": {
"field": "stamp",
"interval": interval
},
"aggs": {
"point_stats": {
"stats": {
"field": "data."+field???
}
}
}
}
}
}
}
}
* 更新 *
按照建议我添加了一个data-type.groovy文件到config/scripts/,该文件包含以下内容:
return doc['data.temperature'].value
Elasticsearch 能够编译脚本:
[2015-11-02 19:50:32,651][INFO ][script] [Atum] compiling script file [/home/user/elasticsearch-1.7.0/config/scripts/data-type.groovy]
我更新了查询以加载脚本文件:
query = {
"size": 0,
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{"range" : {
"stamp" : {
"gt" : start_stamp,
"lt" : end_stamp
}
}}
]
}
}
}
},
"aggs": {
"pathes": {
"terms": {
"field": "type"
},
"aggs": {
"points": {
"date_histogram": {
"field": "stamp",
"interval": interval
},
"aggs": {
"point_stats": {
"stats": {
"script": {"file": "data-type"}
}
}
}
}
}
}
}
}
当我 运行 查询时,我得到以下输出:
{u'status': 400, u'error': u'SearchPhaseExecutionException[Failed to execute phase [query], ... Parse Failure [Unexpected token START_OBJECT in [point_stats].]]; }]'}
数据库中只有温度数据,如果我更改 "script": {"file": "data-type"} 为 "field": "data.temperature" 查询有效.
一个选项是将 humidity
和 temperature
字段重命名为相同的名称,例如 value
,这样您就可以简单地在该字段上进行聚合,这很好。您已经知道它是什么类型的值,因为您从 type
字段中知道它。
"aggs": {
"pathes": {
"terms": {
"field": "type"
},
"aggs": {
"points": {
"date_histogram": {
"field": "stamp",
"interval": interval
},
"aggs": {
"point_stats": {
"stats": {
"field": "data.value"
}
}
}
}
}
}
}
第二个选项是使用 script
,但如果您要添加更多类型的数据(压力等),那会降低性能和可扩展性
"aggs": {
"pathes": {
"terms": {
"field": "type"
},
"aggs": {
"points": {
"date_histogram": {
"field": "stamp",
"interval": interval
},
"aggs": {
"point_stats": {
"stats": {
"script": "return doc.type.value == 'temperature' ? doc['data.temperature'].value : doc['data.humidity'].value"
}
}
}
}
}
}
}
请注意,对于第二个选项,您需要 enable dynamic scripting