ElasticSearch 在脚本嵌套字段上聚合
ElasticSearch aggregate on a scripted nested field
我的 ElasticSearch 索引中有以下映射(简化为其他字段不相关:
{
"test": {
"mappings": {
"properties": {
"name": {
"type": "keyword"
},
"entities": {
"type": "nested",
"properties": {
"text_property": {
"type": "text"
},
"float_property": {
"type": "float"
}
}
}
}
}
}
}
数据看起来像这样(再次简化):
[
{
"name": "a",
"entities": [
{
"text_property": "foo",
"float_property": 0.2
},
{
"text_property": "bar",
"float_property": 0.4
},
{
"text_property": "baz",
"float_property": 0.6
}
]
},
{
"name": "b",
"entities": [
{
"text_property": "foo",
"float_property": 0.9
}
]
},
{
"name": "c",
"entities": [
{
"text_property": "foo",
"float_property": 0.2
},
{
"text_property": "bar",
"float_property": 0.9
}
]
}
]
我正在尝试对每个文档的最大值 float_property
执行桶聚合。因此,对于上面的示例,以下是所需的响应:
...
{
"buckets": [
{
"key": "0.9",
"doc_count": 2
},
{
"key": "0.6",
"doc_count": 1
}
]
}
as doc a
的 float_property
的最高嵌套值是 0.6,b
的是 0.9,c
的是 0.9。
我试过混合使用 nested
和 aggs
,以及 runtime_mappings
,但我不确定以何种顺序使用它们,或者这是否是甚至有可能。
我创建了索引,您的映射将 float_property 类型更改为 double。
PUT testindex
{
"mappings": {
"properties": {
"name": {
"type": "keyword"
},
"entities": {
"type": "nested",
"include_in_parent": true,
"properties": {
"text_property": {
"type": "text"
},
"float_property": {
"type": "double"
}
}
}
}
}
}
Added "include_in_parent":true for "nested" type
索引文档:
PUT testindex/_doc/1
{
"name": "a",
"entities": [
{
"text_property": "foo",
"float_property": 0.2
},
{
"text_property": "bar",
"float_property": 0.4
},
{
"text_property": "baz",
"float_property": 0.6
}
]
}
PUT testindex/_doc/2
{
"name": "b",
"entities": [
{
"text_property": "foo",
"float_property": 0.9
}
]
}
PUT testindex/_doc/3
{
"name": "c",
"entities": [
{
"text_property": "foo",
"float_property": 0.2
},
{
"text_property": "bar",
"float_property": 0.9
}
]
}
然后嵌套字段上的术语聚合:
POST testindex/_search
{
"from": 0,
"size": 30,
"query": {
"match_all": {}
},
"aggregations": {
"entities.float_property": {
"terms": {
"field": "entities.float_property"
}
}
}
}
聚合结果如下:
"aggregations" : {
"entities.float_property" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : 0.2,
"doc_count" : 2
},
{
"key" : 0.9,
"doc_count" : 2
},
{
"key" : 0.4,
"doc_count" : 1
},
{
"key" : 0.6,
"doc_count" : 1
}
]
}
}
我终于弄明白了。
我没有意识到的两件事是:
- 您可以为存储桶聚合提供
script
而不是 field
键。
- 您可以使用
params._source
. 直接访问嵌套值,而不是使用 nested
查询
这两件事的结合让我写出了正确的查询:
{
"size": 0,
"aggs": {
"max.float_property": {
"terms": {
"script": "double max = 0; for (item in params._source.entities) { if (item.float_property > max) { max = item.float_property; }} return max;"
}
}
}
}
回复:
{
...
"aggregations": {
"max.float_property": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "0.9",
"doc_count": 2
},
{
"key": "0.6",
"doc_count": 1
}
]
}
}
}
虽然我很困惑,因为我认为访问 nested
字段的正确方法是使用 nested
查询类型。不幸的是,这方面的文档很少,所以我仍然不确定这是否是 intended/correct 在脚本嵌套字段上聚合的方法。
我的 ElasticSearch 索引中有以下映射(简化为其他字段不相关:
{
"test": {
"mappings": {
"properties": {
"name": {
"type": "keyword"
},
"entities": {
"type": "nested",
"properties": {
"text_property": {
"type": "text"
},
"float_property": {
"type": "float"
}
}
}
}
}
}
}
数据看起来像这样(再次简化):
[
{
"name": "a",
"entities": [
{
"text_property": "foo",
"float_property": 0.2
},
{
"text_property": "bar",
"float_property": 0.4
},
{
"text_property": "baz",
"float_property": 0.6
}
]
},
{
"name": "b",
"entities": [
{
"text_property": "foo",
"float_property": 0.9
}
]
},
{
"name": "c",
"entities": [
{
"text_property": "foo",
"float_property": 0.2
},
{
"text_property": "bar",
"float_property": 0.9
}
]
}
]
我正在尝试对每个文档的最大值 float_property
执行桶聚合。因此,对于上面的示例,以下是所需的响应:
...
{
"buckets": [
{
"key": "0.9",
"doc_count": 2
},
{
"key": "0.6",
"doc_count": 1
}
]
}
as doc a
的 float_property
的最高嵌套值是 0.6,b
的是 0.9,c
的是 0.9。
我试过混合使用 nested
和 aggs
,以及 runtime_mappings
,但我不确定以何种顺序使用它们,或者这是否是甚至有可能。
我创建了索引,您的映射将 float_property 类型更改为 double。
PUT testindex
{
"mappings": {
"properties": {
"name": {
"type": "keyword"
},
"entities": {
"type": "nested",
"include_in_parent": true,
"properties": {
"text_property": {
"type": "text"
},
"float_property": {
"type": "double"
}
}
}
}
}
}
Added "include_in_parent":true for "nested" type
索引文档:
PUT testindex/_doc/1
{
"name": "a",
"entities": [
{
"text_property": "foo",
"float_property": 0.2
},
{
"text_property": "bar",
"float_property": 0.4
},
{
"text_property": "baz",
"float_property": 0.6
}
]
}
PUT testindex/_doc/2
{
"name": "b",
"entities": [
{
"text_property": "foo",
"float_property": 0.9
}
]
}
PUT testindex/_doc/3
{
"name": "c",
"entities": [
{
"text_property": "foo",
"float_property": 0.2
},
{
"text_property": "bar",
"float_property": 0.9
}
]
}
然后嵌套字段上的术语聚合:
POST testindex/_search
{
"from": 0,
"size": 30,
"query": {
"match_all": {}
},
"aggregations": {
"entities.float_property": {
"terms": {
"field": "entities.float_property"
}
}
}
}
聚合结果如下:
"aggregations" : {
"entities.float_property" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : 0.2,
"doc_count" : 2
},
{
"key" : 0.9,
"doc_count" : 2
},
{
"key" : 0.4,
"doc_count" : 1
},
{
"key" : 0.6,
"doc_count" : 1
}
]
}
}
我终于弄明白了。
我没有意识到的两件事是:
- 您可以为存储桶聚合提供
script
而不是field
键。 - 您可以使用
params._source
. 直接访问嵌套值,而不是使用
nested
查询
这两件事的结合让我写出了正确的查询:
{
"size": 0,
"aggs": {
"max.float_property": {
"terms": {
"script": "double max = 0; for (item in params._source.entities) { if (item.float_property > max) { max = item.float_property; }} return max;"
}
}
}
}
回复:
{
...
"aggregations": {
"max.float_property": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "0.9",
"doc_count": 2
},
{
"key": "0.6",
"doc_count": 1
}
]
}
}
}
虽然我很困惑,因为我认为访问 nested
字段的正确方法是使用 nested
查询类型。不幸的是,这方面的文档很少,所以我仍然不确定这是否是 intended/correct 在脚本嵌套字段上聚合的方法。