直方图未从正确的最小值开始,甚至添加了过滤器
Histogram is not starting at the right min even filter added
映射
"eventTime": {
"type": "long"
},
查询
POST some_indices/_search
{
"size": 0,
"query": {
"constant_score": {
"filter": {
"range": {
"eventTime": {
"from": 1563120000000,
"to": 1565712000000,
"format": "epoch_millis"
}
}
}
}
},
"aggs": {
"min_eventTime": { "min" : { "field": "eventTime"} },
"max_eventTime": { "max" : { "field": "eventTime"} },
"time_series": {
"histogram": {
"field": "eventTime",
"interval": 86400000,
"min_doc_count" : 0,
"extended_bounds": {
"min": 1563120000000,
"max": 1565712000000
}
}
}
}
}
回应
"aggregations": {
"max_eventTime": {
"value": 1565539199997
},
"min_eventTime": {
"value": 1564934400000
},
"time_series": {
"buckets": [
{
"key": 1563062400000,
"doc_count": 0
},
{
"key": 1563148800000,
"doc_count": 0
},
{
...
问题
正如参考文献中明确提到的那样
For filtering buckets, one should nest the histogram aggregation under a range filter aggregation with the appropriate from/to settings.
我正确设置了过滤器(正如 demo 所做的那样)并且 min
和 max
也提供了证据。
但为什么 first key
比 from 小 ](或min_eventTime)?
太奇怪了,我现在完全迷路了;(
如有任何建议,我们将不胜感激 ;)
参考资料
我现在找到了一个解决方案,但我觉得这是 Elastic Search 中的一个错误。
我正在使用 date_histogram
,尽管该字段本身是一个 long 类型,并且通过 offset
我将起点向前移动到正确的时间戳。
"aggs": {
"time_series": {
"date_histogram": {
"field": "eventTime",
"interval": 86400000,
"offset": "+16h",
"min_doc_count": 0,
"extended_bounds": {
"min": 1563120000000,
"max": 1565712000000
}
},
"aggs": {
"order_amount_total": {
"sum": {
"field": "order_amount"
}
}
}
}
}
已更新
感谢@Val的帮助,我重新思考了一下,测试如下:
@Test
public void testComputation() {
System.out.println(1563120000000L % 86400000L); // 57600000
System.out.println(1563062400000L % 86400000L); // 0
}
我想引用文档
With extended_bounds setting, you now can "force" the histogram aggregation to start building buckets on a specific min value and also keep on building buckets up to a max value (even if there are no documents anymore). Using extended_bounds only makes sense when min_doc_count is 0 (the empty buckets will never be returned if min_doc_count is greater than 0).
但我相信 specific min value
应该是 0, interval, 2 * interval, 3 * interval, ....
之一,而不是我在问题中使用的随机值。
所以基本上在我的情况下,我可以使用直方图的 offset
来解决问题,如下所示。
我其实根本不需要date_histogram。
"histogram": {
"field": "eventTime",
"interval": 86400000,
"offset": 57600000,
"min_doc_count" : 0,
"extended_bounds": {
"min": 1563120000000,
"max": 1565712000000
}
}
Elastic Search 会员@polyfractal 的一个清晰的解释(感谢详细的crystal解释)也证明了同样的逻辑,更多细节可以找到here.
我想在这里引用一个设计原因:
if we cut the aggregation off right at the extended_bounds.min/max, we would generate buckets that are not the full interval and that would break many assumptions about how the histogram works.
映射
"eventTime": {
"type": "long"
},
查询
POST some_indices/_search
{
"size": 0,
"query": {
"constant_score": {
"filter": {
"range": {
"eventTime": {
"from": 1563120000000,
"to": 1565712000000,
"format": "epoch_millis"
}
}
}
}
},
"aggs": {
"min_eventTime": { "min" : { "field": "eventTime"} },
"max_eventTime": { "max" : { "field": "eventTime"} },
"time_series": {
"histogram": {
"field": "eventTime",
"interval": 86400000,
"min_doc_count" : 0,
"extended_bounds": {
"min": 1563120000000,
"max": 1565712000000
}
}
}
}
}
回应
"aggregations": {
"max_eventTime": {
"value": 1565539199997
},
"min_eventTime": {
"value": 1564934400000
},
"time_series": {
"buckets": [
{
"key": 1563062400000,
"doc_count": 0
},
{
"key": 1563148800000,
"doc_count": 0
},
{
...
问题
正如参考文献中明确提到的那样
For filtering buckets, one should nest the histogram aggregation under a range filter aggregation with the appropriate from/to settings.
我正确设置了过滤器(正如 demo 所做的那样)并且 min
和 max
也提供了证据。
但为什么 first key
比 from 小 ](或min_eventTime)?
太奇怪了,我现在完全迷路了;(
如有任何建议,我们将不胜感激 ;)
参考资料
我现在找到了一个解决方案,但我觉得这是 Elastic Search 中的一个错误。
我正在使用 date_histogram
,尽管该字段本身是一个 long 类型,并且通过 offset
我将起点向前移动到正确的时间戳。
"aggs": {
"time_series": {
"date_histogram": {
"field": "eventTime",
"interval": 86400000,
"offset": "+16h",
"min_doc_count": 0,
"extended_bounds": {
"min": 1563120000000,
"max": 1565712000000
}
},
"aggs": {
"order_amount_total": {
"sum": {
"field": "order_amount"
}
}
}
}
}
已更新
感谢@Val的帮助,我重新思考了一下,测试如下:
@Test
public void testComputation() {
System.out.println(1563120000000L % 86400000L); // 57600000
System.out.println(1563062400000L % 86400000L); // 0
}
我想引用文档
With extended_bounds setting, you now can "force" the histogram aggregation to start building buckets on a specific min value and also keep on building buckets up to a max value (even if there are no documents anymore). Using extended_bounds only makes sense when min_doc_count is 0 (the empty buckets will never be returned if min_doc_count is greater than 0).
但我相信 specific min value
应该是 0, interval, 2 * interval, 3 * interval, ....
之一,而不是我在问题中使用的随机值。
所以基本上在我的情况下,我可以使用直方图的 offset
来解决问题,如下所示。
我其实根本不需要date_histogram。
"histogram": {
"field": "eventTime",
"interval": 86400000,
"offset": 57600000,
"min_doc_count" : 0,
"extended_bounds": {
"min": 1563120000000,
"max": 1565712000000
}
}
Elastic Search 会员@polyfractal 的一个清晰的解释(感谢详细的crystal解释)也证明了同样的逻辑,更多细节可以找到here.
我想在这里引用一个设计原因:
if we cut the aggregation off right at the extended_bounds.min/max, we would generate buckets that are not the full interval and that would break many assumptions about how the histogram works.