如何使用模糊和匹配短语？

Question

我正在使用 elasticsearch 版本 7.6.2

我想搜索一个句子并获得与句子fuzziness

相同词序（如match_phrase）的结果

示例：

PUT demo_idx/_doc/1
{
  "content": "michael jordan and scottie pippen"
}

我要搜索以下句子（fuzziness 等于 2）：

"michael jordan and scottie pippen" -> 得到结果（原因：同一句话）
"scottie pippen and michael jordan" -> 0 个结果（原因：单词顺序不正确）
"ichael jordan and scottie pippen" -> 得到结果（原因：'m' 迈克尔不见了，1 模糊）
"ichae jordan 和 scottie pippen" -> 获取结果（原因：'m' + 'l' 的 michael 丢失，2 模糊）
"ichael jordan and cottie pippen" -> 获取结果（原因：缺少 michael 的 'm' 和 scottie 的 's'，2 模糊性）
"ichael jordan and cottie pippe" -> 0 个结果（原因：缺少 michael 的 'm' 和 scottie 的 's' 以及 pippen 的 'n'，3 个模糊）
"ichael jordan and ottie pippen" -> 0 个结果（原因：michael 的 'm' 和 scottie 的 's' + 'c' 缺失，3 个模糊）

我阅读并尝试了这个 post 的解决方案：但我没有得到所需的结果。

我试过：

"query": {
            "span_near": {
                "clauses": [
                    {"span_multi":
                     {
                         "match": {
                             "fuzzy": {
                                "content": {
                                    "value": query,
                                    "fuzziness": 2
                                }
                            }
                            }
                     }
                    }
                ],
            }
        }

但是没用。

如何正确搜索查询以获得我想要的结果？

Answer 1

如果你想精确匹配模糊度，你可以在定义索引时使用关键字分词器

PUT test_index
{
  "mappings": {
    "properties": {
      "content": {
      "type":"text",
      "analyzer": "custom_english"
      }
    }
  },
  "settings": {
    "analysis": {
      "analyzer": {
        "custom_english": {
          "tokenizer": "keyword",
          "filter": [
            "lowercase"
          ],
          "type":"custom"
        }
      }
    }
  }
}

然后您可以使用模糊查询来获取您的搜索结果

GET test_index/_search
{
  "query": {
    "fuzzy": {
      "content": {
        "value": "ichael jordan and ottie pippen"
      }
    }
  }
}

这适用于您在问题中提到的所有测试用例。

如何使用模糊和匹配短语？

How to use fuzzy and match phrase?

elasticsearch