如何根据不同的对象编写 ElasticSearch Query?
How to write ElasticSearch Query on the basis of distinct objects?
这里我试图根据 tenant_id 和 hierarchy_name 获取不同的属性名称,这是我的索引数据
{
"hits": [
{
"_index": "emp_indexs_datas_d_v",
"_type": "bulkindexing",
"_id": "84",
"_source": {
"id": "2",
"name": "PRODUCT",
"values": "GEO"
}
},
{
"_index": "emp_indexs_datas_d_v",
"_type": "bulkindexing",
"_id": "88",
"_source": {
"id": "1",
"name": "CUSTOMER",
"values": "CUSTOMER_OPEN_1"
}
},
{
"_index": "emp_indexs_datas_d_v",
"_type": "bulkindexing",
"_id": "98",
"_source": {
"id": "2",
"name": "PRODUCT",
"values": "CUSTOMER_OPEN_2"
}
},
{
"_index": "emp_indexs_datas_d_v",
"_type": "bulkindexing",
"_id": "100",
"_source": {
"id": "1",
"name": "CUSTOMER",
"values": "CUSTOMER-ALL"
}
},
{
"_index": "emp_indexs_datas_d_v",
"_type": "bulkindexing",
"_id": "99",
"_source": {
"id": "2",
"name": "CUSTOMER",
"values": "CUSTOMER_OPEN_2"
}
]
}
这是在这里尝试的查询,我在 hierarchy_name
的基础上得到了不同的 attribute_name
{
"query": {
"multi_match": {
"query": "CUSTOMER",
"fields": [
"hierarchy_name"
]
}
},
"collapse": {
"field": "attribute_name.keyword"
}
}
现在想再匹配一个 属性 tenant_id , 之前匹配的是 hierarchy_name ,谁能帮我查询一下
预期输出。假设 tenant_id 2 和 hierarchy_name PRODUCT 我们得到
{
"hits": [
{
"_index": "emp_indexs_datas_d_v",
"_type": "bulkindexing",
"_id": "84",
"_source": {
"tenant_id": "2",
"hierarchy_name": "CUSTOMER",
"attribute_name": "GEO"
}
},
{
"_index": "emp_indexs_datas_d_v",
"_type": "bulkindexing",
"_id": "98",
"_source": {
"tenant_id": "2",
"hierarchy_name": "CUSTOMER",
"attribute_name": "CUSTOMER_OPEN_2"
}
}
]
}
您可以使用 bool/must
子句的组合来组合多个条件
{
"query": {
"bool": {
"must": [
{
"match": {
"tenant_id": 2
}
},
{
"multi_match": {
"query": "PRODUCT",
"fields": [
"hierarchy_name"
]
}
}
]
}
},
"collapse": {
"field": "attribute_name.keyword"
}
}
搜索结果将是
"hits": [
{
"_index": "67379727",
"_type": "_doc",
"_id": "1",
"_score": 1.4144652,
"_source": {
"tenant_id": "2",
"hierarchy_name": "PRODUCT",
"attribute_name": "GEO"
},
"fields": {
"attribute_name.keyword": [
"GEO"
]
}
},
{
"_index": "67379727",
"_type": "_doc",
"_id": "3",
"_score": 1.4144652,
"_source": {
"tenant_id": "2",
"hierarchy_name": "PRODUCT",
"attribute_name": "CUSTOMER_OPEN_2"
},
"fields": {
"attribute_name.keyword": [
"CUSTOMER_OPEN_2"
]
}
}
]
这是另一种方法,它在三个方面不同于公认的答案:
- 已分析的
match
查询已替换为未分析的 term
过滤器。使用分析过的过滤器可以产生 unexpected/surprising 结果(参见 match
docs 的解释)
multi-match
查询替换为 term
查询。对单个字段使用多重匹配有点多余且难以阅读,而且它是另一个分析查询
collapse
替换为 terms
聚合。这就是我一直做的方式
使用 terms
agg 获取 attribute_name.keyword
的所有值意味着我们限制每个分片的结果数量。这可以通过使用 composite aggregation
来解决。我不知道同样的问题是否适用于 collapse
的使用,但如果您有大量不同的值,那么检查可能是明智的。
使用 term
查询和 terms
聚合的查询:
{
"query": {
"bool": {
"must": [
{
"term": {
"tenant_id": 2
}
},
{
"term": {
"hierarchy_name": "PRODUCT"
}
}
]
}
},
"aggs": {
"distinct_attribute_names": {
"field": "attribute_name.keyword",
"size": 1000
},
"size": 0
}
这里我试图根据 tenant_id 和 hierarchy_name 获取不同的属性名称,这是我的索引数据
{
"hits": [
{
"_index": "emp_indexs_datas_d_v",
"_type": "bulkindexing",
"_id": "84",
"_source": {
"id": "2",
"name": "PRODUCT",
"values": "GEO"
}
},
{
"_index": "emp_indexs_datas_d_v",
"_type": "bulkindexing",
"_id": "88",
"_source": {
"id": "1",
"name": "CUSTOMER",
"values": "CUSTOMER_OPEN_1"
}
},
{
"_index": "emp_indexs_datas_d_v",
"_type": "bulkindexing",
"_id": "98",
"_source": {
"id": "2",
"name": "PRODUCT",
"values": "CUSTOMER_OPEN_2"
}
},
{
"_index": "emp_indexs_datas_d_v",
"_type": "bulkindexing",
"_id": "100",
"_source": {
"id": "1",
"name": "CUSTOMER",
"values": "CUSTOMER-ALL"
}
},
{
"_index": "emp_indexs_datas_d_v",
"_type": "bulkindexing",
"_id": "99",
"_source": {
"id": "2",
"name": "CUSTOMER",
"values": "CUSTOMER_OPEN_2"
}
]
}
这是在这里尝试的查询,我在 hierarchy_name
的基础上得到了不同的 attribute_name{
"query": {
"multi_match": {
"query": "CUSTOMER",
"fields": [
"hierarchy_name"
]
}
},
"collapse": {
"field": "attribute_name.keyword"
}
}
现在想再匹配一个 属性 tenant_id , 之前匹配的是 hierarchy_name ,谁能帮我查询一下
预期输出。假设 tenant_id 2 和 hierarchy_name PRODUCT 我们得到
{
"hits": [
{
"_index": "emp_indexs_datas_d_v",
"_type": "bulkindexing",
"_id": "84",
"_source": {
"tenant_id": "2",
"hierarchy_name": "CUSTOMER",
"attribute_name": "GEO"
}
},
{
"_index": "emp_indexs_datas_d_v",
"_type": "bulkindexing",
"_id": "98",
"_source": {
"tenant_id": "2",
"hierarchy_name": "CUSTOMER",
"attribute_name": "CUSTOMER_OPEN_2"
}
}
]
}
您可以使用 bool/must
子句的组合来组合多个条件
{
"query": {
"bool": {
"must": [
{
"match": {
"tenant_id": 2
}
},
{
"multi_match": {
"query": "PRODUCT",
"fields": [
"hierarchy_name"
]
}
}
]
}
},
"collapse": {
"field": "attribute_name.keyword"
}
}
搜索结果将是
"hits": [
{
"_index": "67379727",
"_type": "_doc",
"_id": "1",
"_score": 1.4144652,
"_source": {
"tenant_id": "2",
"hierarchy_name": "PRODUCT",
"attribute_name": "GEO"
},
"fields": {
"attribute_name.keyword": [
"GEO"
]
}
},
{
"_index": "67379727",
"_type": "_doc",
"_id": "3",
"_score": 1.4144652,
"_source": {
"tenant_id": "2",
"hierarchy_name": "PRODUCT",
"attribute_name": "CUSTOMER_OPEN_2"
},
"fields": {
"attribute_name.keyword": [
"CUSTOMER_OPEN_2"
]
}
}
]
这是另一种方法,它在三个方面不同于公认的答案:
- 已分析的
match
查询已替换为未分析的term
过滤器。使用分析过的过滤器可以产生 unexpected/surprising 结果(参见match
docs 的解释) multi-match
查询替换为term
查询。对单个字段使用多重匹配有点多余且难以阅读,而且它是另一个分析查询collapse
替换为terms
聚合。这就是我一直做的方式
使用 terms
agg 获取 attribute_name.keyword
的所有值意味着我们限制每个分片的结果数量。这可以通过使用 composite aggregation
来解决。我不知道同样的问题是否适用于 collapse
的使用,但如果您有大量不同的值,那么检查可能是明智的。
使用 term
查询和 terms
聚合的查询:
{
"query": {
"bool": {
"must": [
{
"term": {
"tenant_id": 2
}
},
{
"term": {
"hierarchy_name": "PRODUCT"
}
}
]
}
},
"aggs": {
"distinct_attribute_names": {
"field": "attribute_name.keyword",
"size": 1000
},
"size": 0
}