ElasticSearch 嵌套聚合与过滤器和查询

ElasticSearch Nested aggregation with filters and query

我想计算嵌套字段的中位数。嵌套字段包含具有某些属性的对象列表。我想在计算中位数之前过滤掉其中的一些。 例如,假设我在嵌套字段中有 10 个对象,但只有 10 个对象中的 7 个将用于计算中位数。

query_median = {
    "query": {
        "bool": {
            "filter": [
                {
                    "term": {
                        "date": "2020-05-18"
                    }
                },
                {
                    "term": {
                        "group_name": "some_name"
                    }
                }
            ]
        }
    },
    "aggs": {
        "median_value": {
            "nested": {
                "path": "people"
            },
            "aggs": {
                "median": {
                    "percentiles": {
                        "field": "people.for_median_attr",
                        "percents": [50]
                    }
                }
            }
        }
    }
}

以上查询有效,但没有过滤器。当我想在 aggs 中添加额外的过滤器时,它会给我与没有任何过滤器的情况相同的 value。下面是我试过的:

query_median = {
    "query": {
        "bool": {
            "filter": [
                {
                    "term": {
                        "date": "2020-05-18"
                    }
                },
                {
                    "term": {
                        "group_name": "some_name"
                    }
                }
            ]
        }
    },
    "aggs": {
        "median_value": {
            "nested": {
                "path": "people"
            },
            "aggs": {
                "filter_out": {
                    "filter": {
                        "bool": {
                            "must": [
                                {
                                    "term": {
                                        "people.attr_not_wanted1": False
                                    },
                                    "term": {
                                        "people.attr_not_wanted2": False
                                    }
                                }
                            ]
                        }
                    },
                    "aggs": {
                        "median": {
                            "percentiles": {
                                "field": "people.for_median_attr",
                                "percents": [50]
                            }
                        }
                    }
                }
            }
        }
    }
}

示例文档:

{
        "_index" : "some_index",
        "_type" : "_doc",
        "_id" : "some_id",
        "_score" : 1.0,
        "_source" : {
          "date" : "2020-05-10",
          "group_name" : "some_name",
          "org_code" : "some_code",
          "people" : [
            {
              "nickname" : "xxx",
              "review_count" : 20.0,
              "not_wanted_1" : false,
              "not_wanted_2" : false
            },
            {
              "nickname" : "yyy",
              "review_count" : 18.0,
              "not_wanted_1" : false,
              "not_wanted_2" : false
            },
            {
              "nickname" : "zzz",
              "value_for_median" : 11.0,
              "not_wanted_1" : true,
              "not_wanted_2" : true
            },
            ...
          ]
        }
      }
    ]
  }

在这种情况下,中位数仅根据两个数字计算得出:2018

你快到了。您只是在嵌套过滤器中缺少几个大括号,您应该选择 true 而不是 false,因为您希望保留嵌套文档以计算它们的中值。

您的查询应如下所示:

{
  "query": {
     ...
  },
  "aggs": {
    "median_value": {
      "nested": {
        "path": "people"
      },
      "aggs": {
        "filter_out": {
          "filter": {
            "bool": {
              "must": [
                {
                  "term": {
                    "people.not_wanted_1": true
                  }
                },
                {
                  "term": {
                    "people.not_wanted_2": true
                  }
                }
              ]
            }
          },
          "aggs": {
            "median": {
              "percentiles": {
                "field": "people.value_for_median",
                "percents": [
                  50
                ]
              }
            }
          }
        }
      }
    }
  }
}

结果:

  "aggregations" : {
    "median_value" : {
      "doc_count" : 3,
      "filter_out" : {
        "doc_count" : 1,
        "median" : {
          "values" : {
            "50.0" : 11.0
          }
        }
      }
    }
  }

根据 https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-filters-aggregation.html 上的文档,您可以尝试将查询的 'filter_out' 部分更新为:

    "filter_out" : {
      "filters" : {
        "filters" : [
          { "term" : { "people.attr_not_wanted1" : false   }},
          { "term" : { "people.attr_not_wanted2" : false }}
        ]
      }
    }