直方图未从正确的最小值开始,甚至添加了过滤器

Histogram is not starting at the right min even filter added

映射

          "eventTime": {
            "type": "long"
          },

查询

POST some_indices/_search
{
  "size": 0,
  "query": {
    "constant_score": {
      "filter": {
            "range": {                   
              "eventTime": {
                "from": 1563120000000,
                "to": 1565712000000,
                "format": "epoch_millis"
              }
        }
      }
    }
  },
  "aggs": {
      "min_eventTime": { "min" : { "field": "eventTime"} }, 
      "max_eventTime": { "max" : { "field": "eventTime"} }, 
      "time_series": {
        "histogram": {
          "field": "eventTime",
          "interval": 86400000, 
          "min_doc_count" : 0,
          "extended_bounds": {            
            "min": 1563120000000,
            "max": 1565712000000
          }
        }
      }
  }
}

回应

"aggregations": {
    "max_eventTime": {
      "value": 1565539199997
    },
    "min_eventTime": {
      "value": 1564934400000
    },
    "time_series": {
      "buckets": [
        {
          "key": 1563062400000,
          "doc_count": 0
        },
        {
          "key": 1563148800000,
          "doc_count": 0
        },
        {
        ...

问题

正如参考文献中明确提到的那样

For filtering buckets, one should nest the histogram aggregation under a range filter aggregation with the appropriate from/to settings.

我正确设置了过滤器(正如 demo 所做的那样)并且 minmax 也提供了证据。

但为什么 first keyfrom ](或min_eventTime)?

太奇怪了,我现在完全迷路了;(

如有任何建议,我们将不胜感激 ;)

参考资料

我现在找到了一个解决方案,但我觉得这是 Elastic Search 中的一个错误。

我正在使用 date_histogram,尽管该字段本身是一个 long 类型,并且通过 offset 我将起点向前移动到正确的时间戳。

  "aggs": {
    "time_series": {
      "date_histogram": {
        "field": "eventTime",
        "interval": 86400000,
        "offset": "+16h",
        "min_doc_count": 0,
        "extended_bounds": {
          "min": 1563120000000,
          "max": 1565712000000
        }
      },
      "aggs": {
        "order_amount_total": {
          "sum": {
            "field": "order_amount"
          }
        }
      }
    }
  }

已更新

感谢@Val的帮助,我重新思考了一下,测试如下:

    @Test
    public void testComputation() {
        System.out.println(1563120000000L  % 86400000L); // 57600000
        System.out.println(1563062400000L  % 86400000L); // 0
    }

我想引用文档

With extended_bounds setting, you now can "force" the histogram aggregation to start building buckets on a specific min value and also keep on building buckets up to a max value (even if there are no documents anymore). Using extended_bounds only makes sense when min_doc_count is 0 (the empty buckets will never be returned if min_doc_count is greater than 0).

但我相信 specific min value 应该是 0, interval, 2 * interval, 3 * interval, .... 之一,而不是我在问题中使用的随机值。

所以基本上在我的情况下,我可以使用直方图的 offset 来解决问题,如下所示。

我其实根本不需要date_histogram。

       "histogram": {
          "field": "eventTime",
          "interval": 86400000, 
          "offset": 57600000,
          "min_doc_count" : 0,
          "extended_bounds": {            
            "min": 1563120000000,
            "max": 1565712000000
          }
        }

Elastic Search 会员@polyfractal 的一个清晰的解释(感谢详细的crystal解释)也证明了同样的逻辑,更多细节可以找到here.

我想在这里引用一个设计原因:

if we cut the aggregation off right at the extended_bounds.min/max, we would generate buckets that are not the full interval and that would break many assumptions about how the histogram works.