ElasticSearch 通过具有可变嵌套的嵌套字段聚合(或超过特定的 json 字段)
ElasticSearch aggregating by a nested field with variable nesting (or over particular json field)
我有以下结构GET /index-*/_mapping
:
"top_field" : {
"properties" : {
"dict_key1" : {
"properties" : {
"field1" : {...},
"field2" : {...},
"field3" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"field4" : {...}
},
"dict_key2" : {
"properties" : {
"field1" : {...},
"field2" : {...},
"field3" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"field4" : {...}
},
"dict_key3": ...
}
换句话说,top_field
存储一个json。
我想汇总 'field3.keyword'
,而不考虑 dict_key*
。类似于 top_field.*.field3.keyword
.
但是,无论是否嵌套,我都无法使用术语聚合来让它工作。我也尝试通过不同的 dict_key*
进行存储,这几乎一样好,但我也无法让它工作。
我该怎么做?
TL;DR 我前段时间遇到了同样的问题 (Terms aggregation with nested wildcard path),结果证明由于查找和路径访问器的执行方式,这不可能直接实现。
虽然有一个脚本解决方法:
{
"size": 0,
"aggs": {
"terms_emulator": {
"scripted_metric": {
"init_script": "state.keyword_counts = [:]",
"map_script": """
def source = params._source['top_field'];
for (def key : source.keySet()) {
if (!source[key].containsKey('field3')) continue;
def field3_kw = source[key]['field3'];
if (state.keyword_counts.containsKey(field3_kw)) {
state.keyword_counts[field3_kw] += 1;
} else {
state.keyword_counts[field3_kw] = 1;
}
}
""",
"combine_script": "state",
"reduce_script": "states[0]"
}
}
}
}
产生类似
的东西
"aggregations" : {
"terms_emulator" : {
"value" : {
"keyword_counts" : {
"world" : 1,
"hello" : 2
}
}
}
}
虽然这工作得很好,但我不建议在生产中使用脚本。您宁愿重组您的数据,以便直接查找成为可能。例如:
{
"top_field": {
"entries": [
{
"group_name": "dict_key1",
"key_value_pairs": {
"field3": "hello"
}
},
{
"group_name": "dict_key2",
"key_value_pairs": {
"field3": "world"
}
}
]
}
}
并使 entries
嵌套。甚至可能放弃 top_field
因为它看起来多余并直接从 entries
.
开始
我有以下结构GET /index-*/_mapping
:
"top_field" : {
"properties" : {
"dict_key1" : {
"properties" : {
"field1" : {...},
"field2" : {...},
"field3" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"field4" : {...}
},
"dict_key2" : {
"properties" : {
"field1" : {...},
"field2" : {...},
"field3" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"field4" : {...}
},
"dict_key3": ...
}
换句话说,top_field
存储一个json。
我想汇总 'field3.keyword'
,而不考虑 dict_key*
。类似于 top_field.*.field3.keyword
.
但是,无论是否嵌套,我都无法使用术语聚合来让它工作。我也尝试通过不同的 dict_key*
进行存储,这几乎一样好,但我也无法让它工作。
我该怎么做?
TL;DR 我前段时间遇到了同样的问题 (Terms aggregation with nested wildcard path),结果证明由于查找和路径访问器的执行方式,这不可能直接实现。
虽然有一个脚本解决方法:
{
"size": 0,
"aggs": {
"terms_emulator": {
"scripted_metric": {
"init_script": "state.keyword_counts = [:]",
"map_script": """
def source = params._source['top_field'];
for (def key : source.keySet()) {
if (!source[key].containsKey('field3')) continue;
def field3_kw = source[key]['field3'];
if (state.keyword_counts.containsKey(field3_kw)) {
state.keyword_counts[field3_kw] += 1;
} else {
state.keyword_counts[field3_kw] = 1;
}
}
""",
"combine_script": "state",
"reduce_script": "states[0]"
}
}
}
}
产生类似
的东西"aggregations" : {
"terms_emulator" : {
"value" : {
"keyword_counts" : {
"world" : 1,
"hello" : 2
}
}
}
}
虽然这工作得很好,但我不建议在生产中使用脚本。您宁愿重组您的数据,以便直接查找成为可能。例如:
{
"top_field": {
"entries": [
{
"group_name": "dict_key1",
"key_value_pairs": {
"field3": "hello"
}
},
{
"group_name": "dict_key2",
"key_value_pairs": {
"field3": "world"
}
}
]
}
}
并使 entries
嵌套。甚至可能放弃 top_field
因为它看起来多余并直接从 entries
.