ElasticSearch - 组合过滤器和复合查询以获得独特的字段组合

ElasticSearch - Combine filters & Composite Query to get unique fields combinations

嗯..我对 ES 非常“新手”,所以关于聚合...字典中没有任何词可以描述我的水平 :p

今天我遇到了一个问题,我正在尝试创建一个查询,该查询应该执行类似于 SQL DISTINCT 的查询,但在过滤器之间。我给了这个文档(当然是对真实情况的抽象):

{
  "id": "1",
  "createdAt": 1626783747,
  "updatedAt": 1626783747,
  "isAvailable": true,
  "kind": "document",
  "classification": {
    "id": 1,
    "name": "a_name_for_id_1"
  },
  "structure": {
    "material": "cartoon",
    "thickness": 5
  },
  "shared": true,
  "objective": "Whosebug"
}

由于上述文档的所有数据可能会有所不同,但是我有一些值可能是多余的,例如classification.idkindstructure.material.

因此,为了满足我的要求,我想对这 3 个字段进行“分组”,以便每个字段都有一个独特的组合。如果我们再深入一点,根据以下数据,我应该得到以下可能性:

[{
        "id": "1",
        "createdAt": 1626783747,
        "updatedAt": 1626783747,
        "isAvailable": true,
        "kind": "document",
        "classification": {
            "id": 1,
            "name": "a_name_for_id_1"
        },
        "structure": {
            "material": "cartoon",
            "thickness": 5
        },
        "shared": true,
        "objective": "Whosebug"
    },
    {
        "id": "2",
        "createdAt": 1626783747,
        "updatedAt": 1626783747,
        "isAvailable": true,
        "kind": "document",
        "classification": {
            "id": 2,
            "name": "a_name_for_id_2"
        },
        "structure": {
            "material": "iron",
            "thickness": 3
        },
        "shared": true,
        "objective": "linkedin"
    },
    {
        "id": "3",
        "createdAt": 1626783747,
        "updatedAt": 1626783747,
        "isAvailable": false,
        "kind": "document",
        "classification": {
            "id": 2,
            "name": "a_name_for_id_2"
        },
        "structure": {
            "material": "paper",
            "thickness": 1
        },
        "shared": false,
        "objective": "tiktok"
    },
    {
        "id": "4",
        "createdAt": 1626783747,
        "updatedAt": 1626783747,
        "isAvailable": true,
        "kind": "document",
        "classification": {
            "id": 3,
            "name": "a_name_for_id_3"
        },
        "structure": {
            "material": "cartoon",
            "thickness": 5
        },
        "shared": false,
        "objective": "snapchat"
    },
    {
        "id": "5",
        "createdAt": 1626783747,
        "updatedAt": 1626783747,
        "isAvailable": true,
        "kind": "document",
        "classification": {
            "id": 3,
            "name": "a_name_for_id_3"
        },
        "structure": {
            "material": "paper",
            "thickness": 1
        },
        "shared": true,
        "objective": "twitter"
    },
    {
        "id": "6",
        "createdAt": 1626783747,
        "updatedAt": 1626783747,
        "isAvailable": false,
        "kind": "document",
        "classification": {
            "id": 3,
            "name": "a_name_for_id_3"
        },
        "structure": {
            "material": "iron",
            "thickness": 3
        },
        "shared": true,
        "objective": "facebook"
    }
]

基于以上,我应该在“buckets”中得到以下结果:

当然,为了这个例子(为了方便起见,我还没有任何重复)

然而,除此之外,我还需要一些“前置过滤器”,因为我只想要:

与第一组结果相比,我应该只得到以下组合:

如果您还在阅读,那么..谢谢! xD

因此,如您所见,我需要此字段关于静态模式 kind <> classification_id <> structure_material 的所有可能组合,这些组合与关于 isAvailable, thickness, shared.

的过滤器相匹配

关于输出,命中对我来说并不重要,因为我不需要文件,只需要组合 kind <> classification_id <> structure_material :)

感谢您的帮助:)

最大

您可以使用 Cardinatily 聚合与您现有的 filters.Please 检查此 url 如果您有任何疑问,请告诉我。 https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-cardinality-aggregation.html

感谢同事,我终于可以按预期运行了!

查询

GET index-latest/_search
{
   "size": 0,
   "query": {
      "bool": {
         "filter": [
            {
               "term": {
                  "isAvailable": true
               }
            },
            {
               "range": {
                  "structure.thickness": {
                     "gte": 2,
                     "lte": 4
                  }
               }
            },
            {
               "term": {
                  "shared": true
               }
            }
         ]
      }
   },
   "aggs": {
      "my_agg_example": {
         "composite": {
            "size": 10,
            "sources": [
               {
                  "kind": {
                     "terms": {
                        "field": "kind.keyword",
                        "order": "asc"
                     }
                  }
               },
               {
                  "classification_id": {
                     "terms": {
                        "field": "classification.id",
                        "order": "asc"
                     }
                  }
               },
               {
                  "structure_material": {
                     "terms": {
                        "field": "structure.material.keyword",
                        "order": "asc"
                     }
                  }
               }
            ]
         }
      }
   }
}

那么给出的结果是:

{
   "took": 11,
   "timed_out": false,
   "_shards": {
      "total": 1,
      "successful": 1,
      "skipped": 0,
      "failed": 0
   },
   "hits": {
      "total": {
         "value": 1,
         "relation": "eq"
      },
      "max_score": null,
      "hits": []
   },
   "aggregations": {
      "my_agg_example": {
         "after_key": {
            "kind": "document",
            "classification_id": 2,
            "structure_material": "iron"
         },
         "buckets": [
            {
               "key": {
                  "kind": "document",
                  "classification_id": 2,
                  "structure_material": "iron"
               },
               "doc_count": 1
            }
         ]
      }
   }
}

因此,如我们所见,我们得到以下存储桶:

{
    "key": {
        "kind": "document",
        "classification_id": 2,
        "structure_material": "iron"
    },
    "doc_count": 1
}

Note: Be careful regarding the type of your field.. putting .keyword on classification.id was resulting to no results in the buckets... .keyword should be use only on types such as string (as far as I understood, correct me if I am wrong)

不出所料,我们得到了以下结果(与最初的问题相比):

  • 文件2铁

Note: Be careful, the order of the elements within the aggs.<name>.composite.sources does play a role in the returned results.

谢谢!