如何将弹性搜索中的短语与可扩展的前缀和后缀匹配？

Question

我们有一个用例，我们想在 elastic-search 中匹配短语，但除了短语查询之外，我们还想搜索部分短语。

示例：

搜索词组："welcome you" 或 "lcome you" 或 "welcome yo" 或 "lcome yo" 这应该匹配包含词组的文档：

"welcome you"

"we welcome you"

"welcome you to"

"we welcome you to"

即我们想通过执行具有附加功能的短语查询来维护单词的顺序，即 returns 我们的结果包含短语作为部分子字符串，并且前缀和后缀可扩展到特定的可配置长度。在 elastic 中，我发现了类似 'match_phrase_prefix' 的东西，但它只匹配以特定前缀开头的短语。

Ex return 结果以 d 前缀开头：

$ curl -XGET localhost:9200/startswith/test/_search?pretty -d '{
    "query": {
        "match_phrase_prefix": {
            "title": {
                "query": "d",
                "max_expansions": 5
            }
        }
    }
}'

有什么方法可以让后缀也达到这个目的吗？

Answer 1

我强烈建议您查看 shingle token filter。

您可以使用自定义分析器定义一个索引，该分析器利用带状疱疹来为一组后续标记以及标记本身编制索引。

curl -XPUT localhost:9200/startswith -d '{
  "settings": {
      "analysis": {
        "analyzer": {
          "my_shingles": {
            "tokenizer": "standard",
            "filter": [
              "lowercase",
              "shingles"
            ]
          }
        },
        "filter": {
          "shingles": {
            "type": "shingle",
            "min_shingle_size": 2,
            "max_shingle_size": 2,
            "output_unigrams": true
          }
        }
      }
  },
  "mappings": {
    "test": {
      "properties": {
        "title": {
          "type": "text",
          "analyzer": "my_shingles"
        }
      }
    }
  }
}'

例如，we welcome you to 将被索引为以下标记

we
we welcome
welcome
welcome you
you
you to
to

然后你可以索引几个示例文档：

curl -XPUT localhost:9200/startswith/test/_bulk -d '
{"index": {}}
{"title": "welcome you"}
{"index": {}}
{"title": "we welcome you"}
{"index": {}}
{"title": "welcome you to"}
{"index": {}}
{"title": "we welcome you to"}
'

最后，您可以运行下面的查询来匹配上面的所有四个文档，如下所示：

curl -XPOST localhost:9200/startswith/test/_search -d '{
   "query": {
       "match": {"title": "welcome you"}
   }
}'

请注意，此方法比 match_phrase_prefix 查询更强大，因为它允许您匹配文本正文中任何位置的后续标记，无论是开头还是结尾。

如何将弹性搜索中的短语与可扩展的前缀和后缀匹配？

How to match a phrase in elastic-search with expandable prefix and suffix?

elasticsearch

elasticsearch-2.0

elasticsearch-5