如何根据不同的对象编写 ElasticSearch Query？

Question

这里我试图根据 tenant_id 和 hierarchy_name 获取不同的属性名称，这是我的索引数据

       {
      "hits": [
        {
          "_index": "emp_indexs_datas_d_v",
          "_type": "bulkindexing",
          "_id": "84",
          "_source": {
            "id": "2",
            "name": "PRODUCT",
            "values": "GEO"
          }
        },
        {
          "_index": "emp_indexs_datas_d_v",
          "_type": "bulkindexing",
          "_id": "88",
          "_source": {
            "id": "1",
            "name": "CUSTOMER",
            "values": "CUSTOMER_OPEN_1"
          }
        },
        {
          "_index": "emp_indexs_datas_d_v",
          "_type": "bulkindexing",
          "_id": "98",
          "_source": {
            "id": "2",
            "name": "PRODUCT",
            "values": "CUSTOMER_OPEN_2"
          }
        },
        {
          "_index": "emp_indexs_datas_d_v",
          "_type": "bulkindexing",
          "_id": "100",
          "_source": {
            "id": "1",
            "name": "CUSTOMER",
            "values": "CUSTOMER-ALL"
          }
        },
 {
          "_index": "emp_indexs_datas_d_v",
          "_type": "bulkindexing",
          "_id": "99",
          "_source": {
            "id": "2",
            "name": "CUSTOMER",
            "values": "CUSTOMER_OPEN_2"
          }
      ]
    }

这是在这里尝试的查询，我在 hierarchy_name

的基础上得到了不同的 attribute_name

{
        "query": {
            "multi_match": {
                "query": "CUSTOMER",
                "fields": [
                    "hierarchy_name"
                ]
            }
        },
        "collapse": {
            "field": "attribute_name.keyword"
        }
    }

现在想再匹配一个属性 tenant_id , 之前匹配的是 hierarchy_name ,谁能帮我查询一下

预期输出。假设 tenant_id 2 和 hierarchy_name PRODUCT 我们得到

{
  "hits": [
    {
      "_index": "emp_indexs_datas_d_v",
      "_type": "bulkindexing",
      "_id": "84",
      "_source": {
        "tenant_id": "2",
        "hierarchy_name": "CUSTOMER",
        "attribute_name": "GEO"
      }
    },
    {
      "_index": "emp_indexs_datas_d_v",
      "_type": "bulkindexing",
      "_id": "98",
      "_source": {
        "tenant_id": "2",
        "hierarchy_name": "CUSTOMER",
        "attribute_name": "CUSTOMER_OPEN_2"
      }

    }
  ]
}

Answer 1

您可以使用 bool/must 子句的组合来组合多个条件

{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "tenant_id": 2
          }
        },
        {
          "multi_match": {
            "query": "PRODUCT",
            "fields": [
              "hierarchy_name"
            ]
          }
        }
      ]
    }
  },
  "collapse": {
    "field": "attribute_name.keyword"
  }
}

搜索结果将是

"hits": [
      {
        "_index": "67379727",
        "_type": "_doc",
        "_id": "1",
        "_score": 1.4144652,
        "_source": {
          "tenant_id": "2",
          "hierarchy_name": "PRODUCT",
          "attribute_name": "GEO"
        },
        "fields": {
          "attribute_name.keyword": [
            "GEO"
          ]
        }
      },
      {
        "_index": "67379727",
        "_type": "_doc",
        "_id": "3",
        "_score": 1.4144652,
        "_source": {
          "tenant_id": "2",
          "hierarchy_name": "PRODUCT",
          "attribute_name": "CUSTOMER_OPEN_2"
        },
        "fields": {
          "attribute_name.keyword": [
            "CUSTOMER_OPEN_2"
          ]
        }
      }
    ]

Answer 2

这是另一种方法，它在三个方面不同于公认的答案：

已分析的 match 查询已替换为未分析的 term 过滤器。使用分析过的过滤器可以产生 unexpected/surprising 结果（参见 match docs 的解释）
multi-match 查询替换为 term 查询。对单个字段使用多重匹配有点多余且难以阅读，而且它是另一个分析查询
collapse 替换为 terms 聚合。这就是我一直做的方式

使用 terms agg 获取 attribute_name.keyword 的所有值意味着我们限制每个分片的结果数量。这可以通过使用 composite aggregation 来解决。我不知道同样的问题是否适用于 collapse 的使用，但如果您有大量不同的值，那么检查可能是明智的。

使用 term 查询和 terms 聚合的查询：

{
  "query": {
    "bool": {
      "must": [
        {
          "term": {
            "tenant_id": 2
          }
        },
        {
          "term": {
            "hierarchy_name": "PRODUCT"
          }
        }
      ]
    }
  },
  "aggs": {
    "distinct_attribute_names": {
      "field": "attribute_name.keyword",
      "size": 1000
  },
  "size": 0
}

如何根据不同的对象编写 ElasticSearch Query？

How to write ElasticSearch Query on the basis of distinct objects?

elasticsearch

logstash

kibana

elasticsearch-5

kibana-6