ElasticSearch 在脚本嵌套字段上聚合

ElasticSearch aggregate on a scripted nested field

我的 ElasticSearch 索引中有以下映射(简化为其他字段不相关:

{
  "test": {
    "mappings": {
      "properties": {
        "name": {
          "type": "keyword"
        },
        "entities": {
          "type": "nested",
          "properties": {
            "text_property": {
              "type": "text"
            },
            "float_property": {
              "type": "float"
            }
          }
        }
      }
    }
  }
}

数据看起来像这样(再次简化):

[
  {
    "name": "a",
    "entities": [
      {
        "text_property": "foo",
        "float_property": 0.2
      },
      {
        "text_property": "bar",
        "float_property": 0.4
      },
      {
        "text_property": "baz",
        "float_property": 0.6
      }
    ]
  },
  {
    "name": "b",
    "entities": [
      {
        "text_property": "foo",
        "float_property": 0.9
      }
    ]
  },
  {
    "name": "c",
    "entities": [
      {
        "text_property": "foo",
        "float_property": 0.2
      },
      {
        "text_property": "bar",
        "float_property": 0.9
      }
    ]
  }
]

我正在尝试对每个文档的最大值 float_property 执行桶聚合。因此,对于上面的示例,以下是所需的响应:

...
{
  "buckets": [
    {
      "key": "0.9",
      "doc_count": 2
    },
    {
      "key": "0.6",
      "doc_count": 1
    }
  ]
}

as doc afloat_property 的最高嵌套值是 0.6,b 的是 0.9,c 的是 0.9。

我试过混合使用 nestedaggs,以及 runtime_mappings,但我不确定以何种顺序使用它们,或者这是否是甚至有可能。

我创建了索引,您的映射将 float_property 类型更改为 double。

PUT testindex
{
 "mappings": {
      "properties": {
        "name": {
          "type": "keyword"
        },
        "entities": {
          "type": "nested",
          "include_in_parent": true, 
          "properties": {
            "text_property": {
              "type": "text"
            },
            "float_property": {
              "type": "double"
            }
          }
        }
      }
    }
}

Added "include_in_parent":true for "nested" type

索引文档:

PUT testindex/_doc/1
{
    "name": "a",
    "entities": [
      {
        "text_property": "foo",
        "float_property": 0.2
      },
      {
        "text_property": "bar",
        "float_property": 0.4
      },
      {
        "text_property": "baz",
        "float_property": 0.6
      }
    ]
  }
 
PUT testindex/_doc/2
{
    "name": "b",
    "entities": [
      {
        "text_property": "foo",
        "float_property": 0.9
      }
    ]
  }
  
PUT testindex/_doc/3 
{
    "name": "c",
    "entities": [
      {
        "text_property": "foo",
        "float_property": 0.2
      },
      {
        "text_property": "bar",
        "float_property": 0.9
      }
    ]
  }

然后嵌套字段上的术语聚合:

POST testindex/_search
{
  "from": 0,
  "size": 30,
  "query": {
    "match_all": {}
  },
  "aggregations": {
    "entities.float_property": {
      "terms": {
        "field": "entities.float_property"
      }
    }
  }
}

聚合结果如下:

"aggregations" : {
    "entities.float_property" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : 0.2,
          "doc_count" : 2
        },
        {
          "key" : 0.9,
          "doc_count" : 2
        },
        {
          "key" : 0.4,
          "doc_count" : 1
        },
        {
          "key" : 0.6,
          "doc_count" : 1
        }
      ]
    }
  }

我终于弄明白了。

我没有意识到的两件事是:

  1. 您可以为存储桶聚合提供 script 而不是 field 键。
  2. 您可以使用 params._source.
  3. 直接访问嵌套值,而不是使用 nested 查询

这两件事的结合让我写出了正确的查询:

{
  "size": 0,
  "aggs": {
    "max.float_property": {
      "terms": {
        "script": "double max = 0; for (item in params._source.entities) { if (item.float_property > max) { max = item.float_property; }} return max;"
      }
    }
  }
}

回复:

{
  ...
  "aggregations": {
    "max.float_property": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "0.9",
          "doc_count": 2
        },
        {
          "key": "0.6",
          "doc_count": 1
        }
      ]
    }
  }
}

虽然我很困惑,因为我认为访问 nested 字段的正确方法是使用 nested 查询类型。不幸的是,这方面的文档很少,所以我仍然不确定这是否是 intended/correct 在脚本嵌套字段上聚合的方法。