如何使用elasticsearch在过滤器中实现精确匹配?
How to implement an exact match in a filter with elasticsearch?
我正在 Elasticsearch 2.4 上处理基于名称字段的查询。我感兴趣的领域是:
- 状态
- 城市
- 殖民地
如果我发送这个查询:
{"query":
{"bool" :
{"must" : [
{"match" : {"state" : {"query" : "michoacán de ocampo", "type" : "boolean"} } },
{"match" : {"colony" : {"query" : "zamora", "type" : "boolean"} } },
{"match" : {"city" : {"query" : "zamora", "type" : "boolean"} } }
],
"filter" : {"term" : {"state" : "michoacán"} }
}
} }
结果
{
"_shards": {
"failed": 0,
"successful": 5,
"total": 5
},
"hits": {
"hits": [
{
"_id": "71807",
"_index": "my_place",
"_score": 8.708784,
"_source": {
"@timestamp": "2019-11-13T15:34:33.373Z",
"@version": "1",
"city": "Zamora",
"city_id": 828,
"colony": "Balcones de Zamora",
"id": 71807,
"state": "Michoacán de Ocampo",
"state_id": 16,
"type": "place",
"zipcode": "59624",
"zone_id": null
},
"_type": "place"
},
{
"_id": "71762",
"_index": "my_place",
"_score": 8.634264,
"_source": {
"@timestamp": "2019-11-13T15:34:33.112Z",
"@version": "1",
"city": "Zamora",
"city_id": 828,
"colony": "Zamora de Hidalgo Centro",
"id": 71762,
"state": "Michoacán de Ocampo",
"state_id": 16,
"type": "place",
"zipcode": "59600",
"zone_id": null
},
"_type": "place"
}
],
"max_score": 8.708784,
"total": 2
},
"timed_out": false,
"took": 5
}
哪些还可以
但是如果我在过滤器中发送状态的全名,像这样(注意过滤器中的全名"Michoacán de ocampo")
{"query":
{"bool" :
{"must" : [
{"match" : {"state" : {"query" : "michoacán de ocampo", "type" : "boolean"} } },
{"match" : {"colony" : {"query" : "zamora", "type" : "boolean"} } },
{"match" : {"city" : {"query" : "zamora", "type" : "boolean"} } }
],
"filter" : {"term" : {"state" : "Michoacán de Ocampo"} }
}
} }
我得到了这些结果:
{
"_shards": {
"failed": 0,
"successful": 5,
"total": 5
},
"hits": {
"hits": [],
"max_score": null,
"total": 0
},
"timed_out": false,
"took": 6
}
我需要在过滤器中发送全名,我该如何实现或重新配置我的索引以获得相同的结果?
我的猜测是您的 state
字段的映射是默认映射,即 state
是一个文本字段,带有关键字子字段(参见 dynamic field mapping) .
如果是这种情况,那么您的第一个查询的过滤器 "works" 因为它与默认文本分析器创建的标记之一相匹配。事实上,"Michoacán de Ocampo" 被处理成这三个小写标记:["michoacán"、"de"、"ocampo"]。
出于同样的原因,第二个过滤器无法匹配,因为您将短语 "Michoacán de Ocampo" 保留在大小写中。以下查询应该起作用:
{
"query": {
"bool": {
"must": [
{
"match": {
"state": {
"query": "michoacán de ocampo"
}
}
},
{
"match": {
"colony": {
"query": "zamora"
}
}
},
{
"match": {
"city": {
"query": "zamora"
}
}
}
],
"filter": {
"term": {
"state.keyword": "Michoacán de Ocampo"
}
}
}
}
}
更新:正如 OP 在评论中提到的,他正在使用 2.4,我正在更新我的解决方案以包含适用于它的解决方案。
ES 2.4 解决方案
使用所需设置和映射创建索引
{
"settings": {
"analysis": {
"analyzer": {
"lckeyword": {
"filter": [
"lowercase"
],
"tokenizer": "keyword"
}
}
}
},
"mappings": {
"so": {
"properties": {
"state": {
"type": "string"
},
"city": {
"type": "string"
},
"colony": {
"type": "string"
},
"state_raw": {
"type": "string",
"analyzer": "lckeyword"
}
}
}
}
}
搜索查询
{
"query": {
"filtered": {
"query": {
"bool": {
"should": [
{
"match": {
"state": {
"query": "michoacán de ocampo"
}
}
},
{
"match": {
"colony": {
"query": "zamora"
}
}
},
{
"match": {
"city": {
"query": "zamora"
}
}
}
]
}
},
"filter": {
"term": {
"state_raw": "michoacán de ocampo"
}
}
}
}
}
这里要注意的一件重要事情是创建一个自定义分析器(带小写过滤器的关键字),这样我们创建过滤器的字段将按原样存储,但带有小写字母,因为这就是您在查询中传递的内容。现在上面的查询 return 是你的文档,this 是具有索引创建、示例文档创建和查询的邮递员集合,其中 return 两个文档 returned。
ES 7.X 解决方案
问题是您将 state
字段定义为 text
字段,然后在您的过滤器中,您使用的 [term][1]
查询未按照官方 ES 中的说明进行分析医生
Returns documents that contain an exact term in a provided field.
Hence it would try to find token `Michoacán de Ocampo` in inverted index which isn't present as state field is defined as text and generates 3 tokens `michoacán`, `de` and `ocampo` and ES works on token(search term) to token(inverted index) match. You can check these tokens with [analyze API][2] and can use [explain API][3] to see the tokens generated by ES when the query has results
Fix
---
Define `state` field as a [multi-field][4] and store it as it is(kwyword form) so that you can filter on it.
{
"mappings": {
"properties": {
"state": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
},
"city": {
"type": "text"
},
"colony": {
"type": "text"
}
}
}
}
Now below query would give you both results.
{
"query": {
"bool": {
"must": [
{
"match": {
"state": {
"query": "michoacán de ocampo"
}
}
},
{
"match": {
"colony": {
"query": "zamora"
}
}
},
{
"match": {
"city": {
"query": "zamora"
}
}
}
],
"filter": {
"term": {
"state.raw": "Michoacán de Ocampo" -->notice .raw to search on keyword field.
}
}
}
}
}
编辑: - https://www.getpostman.com/collections/f4b9ed00d50e2f4bc7f4 是邮递员集合 link 如果你想快速测试它。
我正在 Elasticsearch 2.4 上处理基于名称字段的查询。我感兴趣的领域是:
- 状态
- 城市
- 殖民地
如果我发送这个查询:
{"query":
{"bool" :
{"must" : [
{"match" : {"state" : {"query" : "michoacán de ocampo", "type" : "boolean"} } },
{"match" : {"colony" : {"query" : "zamora", "type" : "boolean"} } },
{"match" : {"city" : {"query" : "zamora", "type" : "boolean"} } }
],
"filter" : {"term" : {"state" : "michoacán"} }
}
} }
结果
{
"_shards": {
"failed": 0,
"successful": 5,
"total": 5
},
"hits": {
"hits": [
{
"_id": "71807",
"_index": "my_place",
"_score": 8.708784,
"_source": {
"@timestamp": "2019-11-13T15:34:33.373Z",
"@version": "1",
"city": "Zamora",
"city_id": 828,
"colony": "Balcones de Zamora",
"id": 71807,
"state": "Michoacán de Ocampo",
"state_id": 16,
"type": "place",
"zipcode": "59624",
"zone_id": null
},
"_type": "place"
},
{
"_id": "71762",
"_index": "my_place",
"_score": 8.634264,
"_source": {
"@timestamp": "2019-11-13T15:34:33.112Z",
"@version": "1",
"city": "Zamora",
"city_id": 828,
"colony": "Zamora de Hidalgo Centro",
"id": 71762,
"state": "Michoacán de Ocampo",
"state_id": 16,
"type": "place",
"zipcode": "59600",
"zone_id": null
},
"_type": "place"
}
],
"max_score": 8.708784,
"total": 2
},
"timed_out": false,
"took": 5
}
哪些还可以
但是如果我在过滤器中发送状态的全名,像这样(注意过滤器中的全名"Michoacán de ocampo")
{"query":
{"bool" :
{"must" : [
{"match" : {"state" : {"query" : "michoacán de ocampo", "type" : "boolean"} } },
{"match" : {"colony" : {"query" : "zamora", "type" : "boolean"} } },
{"match" : {"city" : {"query" : "zamora", "type" : "boolean"} } }
],
"filter" : {"term" : {"state" : "Michoacán de Ocampo"} }
}
} }
我得到了这些结果:
{
"_shards": {
"failed": 0,
"successful": 5,
"total": 5
},
"hits": {
"hits": [],
"max_score": null,
"total": 0
},
"timed_out": false,
"took": 6
}
我需要在过滤器中发送全名,我该如何实现或重新配置我的索引以获得相同的结果?
我的猜测是您的 state
字段的映射是默认映射,即 state
是一个文本字段,带有关键字子字段(参见 dynamic field mapping) .
如果是这种情况,那么您的第一个查询的过滤器 "works" 因为它与默认文本分析器创建的标记之一相匹配。事实上,"Michoacán de Ocampo" 被处理成这三个小写标记:["michoacán"、"de"、"ocampo"]。
出于同样的原因,第二个过滤器无法匹配,因为您将短语 "Michoacán de Ocampo" 保留在大小写中。以下查询应该起作用:
{
"query": {
"bool": {
"must": [
{
"match": {
"state": {
"query": "michoacán de ocampo"
}
}
},
{
"match": {
"colony": {
"query": "zamora"
}
}
},
{
"match": {
"city": {
"query": "zamora"
}
}
}
],
"filter": {
"term": {
"state.keyword": "Michoacán de Ocampo"
}
}
}
}
}
更新:正如 OP 在评论中提到的,他正在使用 2.4,我正在更新我的解决方案以包含适用于它的解决方案。
ES 2.4 解决方案
使用所需设置和映射创建索引
{
"settings": {
"analysis": {
"analyzer": {
"lckeyword": {
"filter": [
"lowercase"
],
"tokenizer": "keyword"
}
}
}
},
"mappings": {
"so": {
"properties": {
"state": {
"type": "string"
},
"city": {
"type": "string"
},
"colony": {
"type": "string"
},
"state_raw": {
"type": "string",
"analyzer": "lckeyword"
}
}
}
}
}
搜索查询
{
"query": {
"filtered": {
"query": {
"bool": {
"should": [
{
"match": {
"state": {
"query": "michoacán de ocampo"
}
}
},
{
"match": {
"colony": {
"query": "zamora"
}
}
},
{
"match": {
"city": {
"query": "zamora"
}
}
}
]
}
},
"filter": {
"term": {
"state_raw": "michoacán de ocampo"
}
}
}
}
}
这里要注意的一件重要事情是创建一个自定义分析器(带小写过滤器的关键字),这样我们创建过滤器的字段将按原样存储,但带有小写字母,因为这就是您在查询中传递的内容。现在上面的查询 return 是你的文档,this 是具有索引创建、示例文档创建和查询的邮递员集合,其中 return 两个文档 returned。
ES 7.X 解决方案
问题是您将 state
字段定义为 text
字段,然后在您的过滤器中,您使用的 [term][1]
查询未按照官方 ES 中的说明进行分析医生
Returns documents that contain an exact term in a provided field.
Hence it would try to find token `Michoacán de Ocampo` in inverted index which isn't present as state field is defined as text and generates 3 tokens `michoacán`, `de` and `ocampo` and ES works on token(search term) to token(inverted index) match. You can check these tokens with [analyze API][2] and can use [explain API][3] to see the tokens generated by ES when the query has results
Fix
---
Define `state` field as a [multi-field][4] and store it as it is(kwyword form) so that you can filter on it.
{
"mappings": {
"properties": {
"state": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
},
"city": {
"type": "text"
},
"colony": {
"type": "text"
}
}
}
}
Now below query would give you both results.
{
"query": {
"bool": {
"must": [
{
"match": {
"state": {
"query": "michoacán de ocampo"
}
}
},
{
"match": {
"colony": {
"query": "zamora"
}
}
},
{
"match": {
"city": {
"query": "zamora"
}
}
}
],
"filter": {
"term": {
"state.raw": "Michoacán de Ocampo" -->notice .raw to search on keyword field.
}
}
}
}
}
编辑: - https://www.getpostman.com/collections/f4b9ed00d50e2f4bc7f4 是邮递员集合 link 如果你想快速测试它。