日期范围内的 Elasticsearch 聚合

Question

假设以下是员工工作的映射：

{
    "Project":
    {
         "startDate":{"type":"Date"},
         "endDate":{"type":"Date"},
         "employees":{"type":"keyword"}
    }
}

PFB 示例数据：

{
    "Project1":
    {
         "startDate":"2019-07-01",
         "endDate":"2019-07-03"
         "employees":["emp1","emp2"]
    }
},
{
    "Project2":
    {
         "startDate":"2019-07-02",
         "endDate":"2019-07-04"
         "employees":["emp3","emp4"]
    }
}

这里的员工有从事该项目的员工列表。我无法编写聚合查询，它会给我每天工作的员工人数。即，员工在开始日期 <= currentDate 和 endDate >=currentDate.

的任何一个或多个项目中工作

我想要以下结果：

{
    {
     "key":"2019-07-01",
     "EmployeeCount":2
    },
    {
     "key":"2019-07-02",
     "EmployeeCount":4
    },
    {
     "key":"2019-07-03",
     "EmployeeCount":4
    },
    {
     "key":"2019-07-04",
     "EmployeeCount":2
    }
}

你能指导我哪种聚合可以帮助我解决这个问题吗？

Answer 1

恐怕您要找的是不可能的。

Elasticsearch 不支持 date histogram 聚合来从两个不同的日期字段中获取值，在您的例子中是 startDate 和 endDate。

得到你想要的唯一方法是：

在 startDate
在 endDate
管理对服务层中的结果求和的逻辑。

下面是使用 startDate.

计算员工数量的示例聚合

样本索引

PUT mysampleindex
{  
  "mappings": {
    "properties": {  
       "project":{
         "properties": { 
            "startDate":{  
               "type":"date"
            },
            "endDate":{  
               "type":"date"
            },
            "employees":{  
               "type":"keyword"
            }
         }
      }
    }
  }
}

示例文档

POST mysampleindex/_doc/1
{
    "project":
    {
         "startDate":"2019-07-01",
         "endDate":"2019-07-03",
         "employees":["emp1","emp2"]
    }
}

POST mysampleindex/_doc/2
{
    "project":
    {
         "startDate":"2019-07-02",
         "endDate":"2019-07-04",
         "employees":["emp3","emp4"]
    }
}

聚合查询：

POST mysampleindex/_search
{
  "size": 0, 
  "aggs": {
    "mydates": {
      "date_histogram": {
        "field": "project.startDate",
        "interval": "day",
        "format" : "yyyy-MM-dd"
      },
      "aggs": {
        "emp_count": {
          "value_count": {
            "field": "project.employees"
          }
        }
      }
    }
  }
}

请注意，我在 employees 上使用了 date histogram aggregation with day as interval along with the value_count 聚合作为其子聚合。

查询结果：

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "mydates" : {
      "buckets" : [
        {
          "key_as_string" : "2019-07-01",
          "key" : 1561939200000,
          "doc_count" : 1,
          "emp_count" : {                        <---- Count of employees
            "value" : 2
          }
        },
        {
          "key_as_string" : "2019-07-02",
          "key" : 1562025600000,
          "doc_count" : 1,
          "emp_count" : {                         <---- Count of employees
            "value" : 2
          }
        }
      ]
    }
  }
}

您可以用类似的方式计算 endDate 的员工数（只需将上述聚合查询中的 startDate 替换为 endDate 即可）。获得这两个结果后，您可以在服务层中添加您要查找的内容。

虽然从技术上讲这不是，但我只是希望这会有所帮助！

日期范围内的 Elasticsearch 聚合

Elasticsearch aggregation on date range

aggregation

elasticsearch

样本索引

示例文档

聚合查询：

查询结果：