如何使用elasticsearch在过滤​​器中实现精确匹配?

How to implement an exact match in a filter with elasticsearch?

我正在 Elasticsearch 2.4 上处理基于名称字段的查询。我感兴趣的领域是:

如果我发送这个查询:

    {"query": 
        {"bool" : 
            {"must" : [
                {"match" : {"state" : {"query" : "michoacán de ocampo", "type" : "boolean"} } }, 
                {"match" : {"colony" : {"query" : "zamora", "type" : "boolean"} } }, 
                {"match" : {"city" : {"query" : "zamora", "type" : "boolean"} } } 
            ], 
            "filter" : {"term" : {"state" : "michoacán"} } 
        } 
    } }

结果

{
    "_shards": {
        "failed": 0,
        "successful": 5,
        "total": 5
    },
    "hits": {
        "hits": [
            {
                "_id": "71807",
                "_index": "my_place",
                "_score": 8.708784,
                "_source": {
                    "@timestamp": "2019-11-13T15:34:33.373Z",
                    "@version": "1",
                    "city": "Zamora",
                    "city_id": 828,
                    "colony": "Balcones de Zamora",
                    "id": 71807,
                    "state": "Michoacán de Ocampo",
                    "state_id": 16,
                    "type": "place",
                    "zipcode": "59624",
                    "zone_id": null
                },
                "_type": "place"
            },
            {
                "_id": "71762",
                "_index": "my_place",
                "_score": 8.634264,
                "_source": {
                    "@timestamp": "2019-11-13T15:34:33.112Z",
                    "@version": "1",
                    "city": "Zamora",
                    "city_id": 828,
                    "colony": "Zamora de Hidalgo Centro",
                    "id": 71762,
                    "state": "Michoacán de Ocampo",
                    "state_id": 16,
                    "type": "place",
                    "zipcode": "59600",
                    "zone_id": null
                },
                "_type": "place"
            }
        ],
        "max_score": 8.708784,
        "total": 2
    },
    "timed_out": false,
    "took": 5
}

哪些还可以

但是如果我在过滤器中发送状态的全名,像这样(注意过滤器中的全名"Michoacán de ocampo"

    {"query": 
        {"bool" : 
            {"must" : [
                {"match" : {"state" : {"query" : "michoacán de ocampo", "type" : "boolean"} } }, 
                {"match" : {"colony" : {"query" : "zamora", "type" : "boolean"} } }, 
                {"match" : {"city" : {"query" : "zamora", "type" : "boolean"} } } 
            ], 
            "filter" : {"term" : {"state" : "Michoacán de Ocampo"} } 
        } 
    } }

我得到了这些结果:

{
    "_shards": {
        "failed": 0,
        "successful": 5,
        "total": 5
    },
    "hits": {
        "hits": [],
        "max_score": null,
        "total": 0
    },
    "timed_out": false,
    "took": 6
}

我需要在过滤器中发送全名,我该如何实现或重新配置我的索引以获得相同的结果?

我的猜测是您的 state 字段的映射是默认映射,即 state 是一个文本字段,带有关键字子字段(参见 dynamic field mapping) .

如果是这种情况,那么您的第一个查询的过滤器 "works" 因为它与默认文本分析器创建的标记之一相匹配。事实上,"Michoacán de Ocampo" 被处理成这三个小写标记:["michoacán"、"de"、"ocampo"]。

出于同样的原因,第二个过滤器无法匹配,因为您将短语 "Michoacán de Ocampo" 保留在大小写中。以下查询应该起作用:

{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "state": {
              "query": "michoacán de ocampo"
            }
          }
        },
        {
          "match": {
            "colony": {
              "query": "zamora"
            }
          }
        },
        {
          "match": {
            "city": {
              "query": "zamora"
            }
          }
        }
      ],
      "filter": {
        "term": {
          "state.keyword": "Michoacán de Ocampo"
        }
      }
    }
  }
}

更新:正如 OP 在评论中提到的,他正在使用 2.4,我正在更新我的解决方案以包含适用于它的解决方案。

ES 2.4 解决方案

使用所需设置和映射创建索引

{
    "settings": {
        "analysis": {
            "analyzer": {
                "lckeyword": {
                    "filter": [
                        "lowercase"
                    ],
                    "tokenizer": "keyword"
                }
            }
        }
    },
    "mappings": {
        "so": {
            "properties": {
                "state": {
                    "type": "string"
                },
                "city": {
                    "type": "string"
                },
                "colony": {
                    "type": "string"
                },
                "state_raw": {
                    "type": "string",
                    "analyzer": "lckeyword"
                }
            }
        }
    }
}

搜索查询

{
    "query": {
        "filtered": {
            "query": {
                "bool": {
                    "should": [
                        {
                            "match": {
                                "state": {
                                    "query": "michoacán de ocampo"
                                }
                            }
                        },
                        {
                            "match": {
                                "colony": {
                                    "query": "zamora"
                                }
                            }
                        },
                        {
                            "match": {
                                "city": {
                                    "query": "zamora"
                                }
                            }
                        }
                    ]
                }
            },
            "filter": {
                "term": {
                    "state_raw": "michoacán de ocampo"
                }
            }
        }
    }
}

这里要注意的一件重要事情是创建一个自定义分析器(带小写过滤器的关键字),这样我们创建过滤器的字段将按原样存储,但带有小写字母,因为这就是您在查询中传递的内容。现在上面的查询 return 是你的文档,this 是具有索引创建、示例文档创建和查询的邮递员集合,其中 return 两个文档 returned。

ES 7.X 解决方案

问题是您将 state 字段定义为 text 字段,然后在您的过滤器中,您使用的 [term][1] 查询未按照官方 ES 中的说明进行分析医生

Returns documents that contain an exact term in a provided field.

Hence it would try to find token `Michoacán de Ocampo` in inverted index which isn't present as state field is defined as text and generates 3 tokens `michoacán`, `de` and `ocampo` and ES works on token(search term) to token(inverted index) match. You can check these tokens with [analyze API][2] and can use [explain API][3] to see the tokens generated by ES when the query has results

Fix
---
Define `state` field as a [multi-field][4] and store it as it is(kwyword form) so that you can filter on it.

    {
        "mappings": {
            "properties": {
                "state": {
                    "type": "text",
                    "fields": {
                        "raw": {
                            "type": "keyword"
                        }
                    }
                },
                "city": {
                    "type": "text"
                },
                "colony": {
                    "type": "text"
                }
            }
        }
    }

Now below query would give you both results.

    {
        "query": {
            "bool": {
                "must": [
                    {
                        "match": {
                            "state": {
                                "query": "michoacán de ocampo"
                            }
                        }
                    },
                    {
                        "match": {
                            "colony": {
                                "query": "zamora"
                            }
                        }
                    },
                    {
                        "match": {
                            "city": {
                                "query": "zamora"
                            }
                        }
                    }
                ],
                "filter": {
                    "term": {
                        "state.raw": "Michoacán de Ocampo" -->notice .raw to search on keyword field.
                    }
                }
            }
        }
    }

编辑: - https://www.getpostman.com/collections/f4b9ed00d50e2f4bc7f4 是邮递员集合 link 如果你想快速测试它。