直方图未从正确的最小值开始，甚至添加了过滤器

Question

映射

          "eventTime": {
            "type": "long"
          },

查询

POST some_indices/_search
{
  "size": 0,
  "query": {
    "constant_score": {
      "filter": {
            "range": {                   
              "eventTime": {
                "from": 1563120000000,
                "to": 1565712000000,
                "format": "epoch_millis"
              }
        }
      }
    }
  },
  "aggs": {
      "min_eventTime": { "min" : { "field": "eventTime"} }, 
      "max_eventTime": { "max" : { "field": "eventTime"} }, 
      "time_series": {
        "histogram": {
          "field": "eventTime",
          "interval": 86400000, 
          "min_doc_count" : 0,
          "extended_bounds": {            
            "min": 1563120000000,
            "max": 1565712000000
          }
        }
      }
  }
}

回应

"aggregations": {
    "max_eventTime": {
      "value": 1565539199997
    },
    "min_eventTime": {
      "value": 1564934400000
    },
    "time_series": {
      "buckets": [
        {
          "key": 1563062400000,
          "doc_count": 0
        },
        {
          "key": 1563148800000,
          "doc_count": 0
        },
        {
        ...

问题

正如参考文献中明确提到的那样

For filtering buckets, one should nest the histogram aggregation under a range filter aggregation with the appropriate from/to settings.

我正确设置了过滤器（正如 demo 所做的那样）并且 min 和 max 也提供了证据。

但为什么 first key 比 from 小 ]（或min_eventTime）？

太奇怪了，我现在完全迷路了;(

如有任何建议，我们将不胜感激 ;)

参考资料

https://www.elastic.co/guide/en/elasticsearch/reference/5.5/search-aggregations-bucket-histogram-aggregation.html#search-aggregations-bucket-histogram-aggregation

Answer 1

我现在找到了一个解决方案，但我觉得这是 Elastic Search 中的一个错误。

我正在使用 date_histogram，尽管该字段本身是一个 long 类型，并且通过 offset 我将起点向前移动到正确的时间戳。

  "aggs": {
    "time_series": {
      "date_histogram": {
        "field": "eventTime",
        "interval": 86400000,
        "offset": "+16h",
        "min_doc_count": 0,
        "extended_bounds": {
          "min": 1563120000000,
          "max": 1565712000000
        }
      },
      "aggs": {
        "order_amount_total": {
          "sum": {
            "field": "order_amount"
          }
        }
      }
    }
  }

已更新

感谢@Val的帮助，我重新思考了一下，测试如下：

    @Test
    public void testComputation() {
        System.out.println(1563120000000L  % 86400000L); // 57600000
        System.out.println(1563062400000L  % 86400000L); // 0
    }

我想引用文档

With extended_bounds setting, you now can "force" the histogram aggregation to start building buckets on a specific min value and also keep on building buckets up to a max value (even if there are no documents anymore). Using extended_bounds only makes sense when min_doc_count is 0 (the empty buckets will never be returned if min_doc_count is greater than 0).

但我相信 specific min value 应该是 0, interval, 2 * interval, 3 * interval, .... 之一，而不是我在问题中使用的随机值。

所以基本上在我的情况下，我可以使用直方图的 offset 来解决问题，如下所示。

_{我其实根本不需要date_histogram。}

       "histogram": {
          "field": "eventTime",
          "interval": 86400000, 
          "offset": 57600000,
          "min_doc_count" : 0,
          "extended_bounds": {            
            "min": 1563120000000,
            "max": 1565712000000
          }
        }

Elastic Search 会员@polyfractal 的一个清晰的解释（感谢详细的crystal解释）也证明了同样的逻辑，更多细节可以找到here.

我想在这里引用一个设计原因：

if we cut the aggregation off right at the extended_bounds.min/max, we would generate buckets that are not the full interval and that would break many assumptions about how the histogram works.

直方图未从正确的最小值开始，甚至添加了过滤器

Histogram is not starting at the right min even filter added

histogram

elasticsearch

elasticsearch-aggregation

elasticsearch-5

映射

查询

回应

问题

参考资料

已更新