ElasticSearch 嵌套聚合与过滤器和查询
ElasticSearch Nested aggregation with filters and query
我想计算嵌套字段的中位数。嵌套字段包含具有某些属性的对象列表。我想在计算中位数之前过滤掉其中的一些。
例如,假设我在嵌套字段中有 10 个对象,但只有 10 个对象中的 7 个将用于计算中位数。
query_median = {
"query": {
"bool": {
"filter": [
{
"term": {
"date": "2020-05-18"
}
},
{
"term": {
"group_name": "some_name"
}
}
]
}
},
"aggs": {
"median_value": {
"nested": {
"path": "people"
},
"aggs": {
"median": {
"percentiles": {
"field": "people.for_median_attr",
"percents": [50]
}
}
}
}
}
}
以上查询有效,但没有过滤器。当我想在 aggs
中添加额外的过滤器时,它会给我与没有任何过滤器的情况相同的 value
。下面是我试过的:
query_median = {
"query": {
"bool": {
"filter": [
{
"term": {
"date": "2020-05-18"
}
},
{
"term": {
"group_name": "some_name"
}
}
]
}
},
"aggs": {
"median_value": {
"nested": {
"path": "people"
},
"aggs": {
"filter_out": {
"filter": {
"bool": {
"must": [
{
"term": {
"people.attr_not_wanted1": False
},
"term": {
"people.attr_not_wanted2": False
}
}
]
}
},
"aggs": {
"median": {
"percentiles": {
"field": "people.for_median_attr",
"percents": [50]
}
}
}
}
}
}
}
}
示例文档:
{
"_index" : "some_index",
"_type" : "_doc",
"_id" : "some_id",
"_score" : 1.0,
"_source" : {
"date" : "2020-05-10",
"group_name" : "some_name",
"org_code" : "some_code",
"people" : [
{
"nickname" : "xxx",
"review_count" : 20.0,
"not_wanted_1" : false,
"not_wanted_2" : false
},
{
"nickname" : "yyy",
"review_count" : 18.0,
"not_wanted_1" : false,
"not_wanted_2" : false
},
{
"nickname" : "zzz",
"value_for_median" : 11.0,
"not_wanted_1" : true,
"not_wanted_2" : true
},
...
]
}
}
]
}
在这种情况下,中位数仅根据两个数字计算得出:20
和 18
。
你快到了。您只是在嵌套过滤器中缺少几个大括号,您应该选择 true
而不是 false
,因为您希望保留嵌套文档以计算它们的中值。
您的查询应如下所示:
{
"query": {
...
},
"aggs": {
"median_value": {
"nested": {
"path": "people"
},
"aggs": {
"filter_out": {
"filter": {
"bool": {
"must": [
{
"term": {
"people.not_wanted_1": true
}
},
{
"term": {
"people.not_wanted_2": true
}
}
]
}
},
"aggs": {
"median": {
"percentiles": {
"field": "people.value_for_median",
"percents": [
50
]
}
}
}
}
}
}
}
}
结果:
"aggregations" : {
"median_value" : {
"doc_count" : 3,
"filter_out" : {
"doc_count" : 1,
"median" : {
"values" : {
"50.0" : 11.0
}
}
}
}
}
根据 https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-filters-aggregation.html 上的文档,您可以尝试将查询的 'filter_out' 部分更新为:
"filter_out" : {
"filters" : {
"filters" : [
{ "term" : { "people.attr_not_wanted1" : false }},
{ "term" : { "people.attr_not_wanted2" : false }}
]
}
}
我想计算嵌套字段的中位数。嵌套字段包含具有某些属性的对象列表。我想在计算中位数之前过滤掉其中的一些。 例如,假设我在嵌套字段中有 10 个对象,但只有 10 个对象中的 7 个将用于计算中位数。
query_median = {
"query": {
"bool": {
"filter": [
{
"term": {
"date": "2020-05-18"
}
},
{
"term": {
"group_name": "some_name"
}
}
]
}
},
"aggs": {
"median_value": {
"nested": {
"path": "people"
},
"aggs": {
"median": {
"percentiles": {
"field": "people.for_median_attr",
"percents": [50]
}
}
}
}
}
}
以上查询有效,但没有过滤器。当我想在 aggs
中添加额外的过滤器时,它会给我与没有任何过滤器的情况相同的 value
。下面是我试过的:
query_median = {
"query": {
"bool": {
"filter": [
{
"term": {
"date": "2020-05-18"
}
},
{
"term": {
"group_name": "some_name"
}
}
]
}
},
"aggs": {
"median_value": {
"nested": {
"path": "people"
},
"aggs": {
"filter_out": {
"filter": {
"bool": {
"must": [
{
"term": {
"people.attr_not_wanted1": False
},
"term": {
"people.attr_not_wanted2": False
}
}
]
}
},
"aggs": {
"median": {
"percentiles": {
"field": "people.for_median_attr",
"percents": [50]
}
}
}
}
}
}
}
}
示例文档:
{
"_index" : "some_index",
"_type" : "_doc",
"_id" : "some_id",
"_score" : 1.0,
"_source" : {
"date" : "2020-05-10",
"group_name" : "some_name",
"org_code" : "some_code",
"people" : [
{
"nickname" : "xxx",
"review_count" : 20.0,
"not_wanted_1" : false,
"not_wanted_2" : false
},
{
"nickname" : "yyy",
"review_count" : 18.0,
"not_wanted_1" : false,
"not_wanted_2" : false
},
{
"nickname" : "zzz",
"value_for_median" : 11.0,
"not_wanted_1" : true,
"not_wanted_2" : true
},
...
]
}
}
]
}
在这种情况下,中位数仅根据两个数字计算得出:20
和 18
。
你快到了。您只是在嵌套过滤器中缺少几个大括号,您应该选择 true
而不是 false
,因为您希望保留嵌套文档以计算它们的中值。
您的查询应如下所示:
{
"query": {
...
},
"aggs": {
"median_value": {
"nested": {
"path": "people"
},
"aggs": {
"filter_out": {
"filter": {
"bool": {
"must": [
{
"term": {
"people.not_wanted_1": true
}
},
{
"term": {
"people.not_wanted_2": true
}
}
]
}
},
"aggs": {
"median": {
"percentiles": {
"field": "people.value_for_median",
"percents": [
50
]
}
}
}
}
}
}
}
}
结果:
"aggregations" : {
"median_value" : {
"doc_count" : 3,
"filter_out" : {
"doc_count" : 1,
"median" : {
"values" : {
"50.0" : 11.0
}
}
}
}
}
根据 https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-filters-aggregation.html 上的文档,您可以尝试将查询的 'filter_out' 部分更新为:
"filter_out" : {
"filters" : {
"filters" : [
{ "term" : { "people.attr_not_wanted1" : false }},
{ "term" : { "people.attr_not_wanted2" : false }}
]
}
}