如何使用 elasticsearch-dsl 在所有索引中查找数组中的不同值?

How to find the distinct values in the array in all the indexes using elasticsearch-dsl?

我在 Django 中使用 elasticsearch-dsl。我定义了一个 DocType 文档和一个包含值列表的关键字。


from elasticsearch_dsl import DocType, Text, Keyword

class ProductIndex(DocType):
    Index for products
    id = Keyword()
    slug = Keyword()
    name = Text()
    filter_list = Keyword()

filter_list 是这里包含多个值的数组。现在我有一些值,比如 sample_filter_list,它们是不同的值,其中一些元素可以出现在某些产品的 filter_list 数组中。所以给定这个 sample_filter_list,我想要 filter_list 与 sample_filter_list 相交的所有产品的 filter_list 的所有唯一元素不为空。

for example I have 5 products whose filter_list is like :
1) ['a', 'b', 'c']
2) ['d', 'e', 'f']
3) ['g', 'h', 'i']
4) ['j', 'k', 'l']
5) ['m', 'n', 'o']
and if my sample filter_list is ['a', 'd', 'g', 'j', 'm']
then elasticsearch should return an array containg distinct element 
i.e. ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o']

我对您想要什么感到有点困惑,只查询 filter_listsample_filter_list 相交的产品只是 运行 一个 terms 查询:

ProductIndex.search().filter('terms', filter_list=sample_filter_list)


            Writing Answer not specific to django but general,
            Suppose you have some ES index some_index2 with mapping

            PUT some_index2
              "mappings": {
                "some_type": {
                  "dynamic_templates": [
                      "strings": {
                        "mapping": {
                          "type": "string"
                        "match_mapping_type": "string"
                  "properties": {
                    "field1": {
                      "type": "string"
                    "field2": {
                      "type": "string"

        Also you have inserted the documents 

    Now as you stated you want all the distinct values of field2(filter_list) in your case, You can easily get that by using ElasticSearch term aggregation

    GET some_index2/_search
    "aggs": {
      "some_name": {
        "terms": {
          "field": "field2",
          "size": 10000
    "size": 0

    Which will give you result as:

      "took": 2,
      "timed_out": false,
      "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
      "hits": {
        "total": 3,
        "max_score": 0,
        "hits": []
      "aggregations": {
        "some_name": {
          "doc_count_error_upper_bound": 0,
          "sum_other_doc_count": 0,
          "buckets": [
              "key": "e",
              "doc_count": 2
              "key": "a",
              "doc_count": 1
              "key": "b",
              "doc_count": 1
              "key": "c",
              "doc_count": 1
              "key": "d",
              "doc_count": 1
              "key": "f",
              "doc_count": 1
              "key": "g",
              "doc_count": 1
              "key": "k",
              "doc_count": 1
              "key": "l",
              "doc_count": 1

    where buckets contains the list of all the distinct values.
    you can easily iterate through bucket and find the value under KEY.

Hope this is what is required to you.