弹性停止分析器和模糊搜索问题

Question

我有索引和数据描述 here，而且我已经将分析器设置为 停止分析器。这很好用，因为当我尝试简单的搜索时 POST https:///serverURL/_search?pretty=true

{
  "query": {
    "query_string": {
      "default_field": "title",
      "query": "Rebel the without"   }
 }
}

，服务器真的returns

            "title": "Rebel Without a Cause"

作为结果。

但是，当我尝试使用模糊搜索时

{
  "query": {
    "fuzzy": {
      "title": {
        "value": "Rebel the without"
      }
    }
  }
}

，结果为空。这里到底发生了什么，模糊搜索是否以某种方式禁用了分析器？

Answer 1

Fuzzy query returns 个包含与搜索词相似的词的文档。

因为您没有为“标题”字段定义任何显式映射，它使用标准分析器，其中生成的标记将是：

{
  "tokens" : [
    {
      "token" : "rebel",
      "start_offset" : 0,
      "end_offset" : 5,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "without",
      "start_offset" : 6,
      "end_offset" : 13,
      "type" : "<ALPHANUM>",
      "position" : 1
    },
    {
      "token" : "a",
      "start_offset" : 14,
      "end_offset" : 15,
      "type" : "<ALPHANUM>",
      "position" : 2
    },
    {
      "token" : "cause",
      "start_offset" : 16,
      "end_offset" : 21,
      "type" : "<ALPHANUM>",
      "position" : 3
    }
  ]
}

模糊查询将为您提供与 wihout、case、rebe 等生成的令牌类似的搜索词的结果

GET /myidx/_search
{
  "query": {
    "fuzzy": {
      "title": {
        "value": "case"
      }
    }
  }
}

更新 1：

根据下面的评论，您可以使用match bool prefix query

{
  "query": {
    "match_bool_prefix": {
      "title": {
        "query": "Rebel the without"
      }
    }
  }
}

Answer 2

了解数据在 Elasticsearch 中的处理和存储方式对于了解此行为非常重要。因此，当您设置 stop 分析器时，您提供给系统的任何文本都会转换为标记列表，也就是 terms。此时 Elasticsearch 字段“不记得”您的原始文本（从技术上讲，它存储在 _source 字段中但未编入索引）并且只知道那些术语（每个术语加上它在原始文本中的位置，在您的case - rebel, without, cause) 然后将其存储在 倒排索引 中以供快速查找。

现在你运行 fuzzy 查询 - 这是一个 term-level query 这意味着它适用于特定的条款。相反，您必须使用 全文查询 ，例如 match:

POST /fuzz/_search
{
  "query": {
    "match": {
      "title": {
        "query": "Reble without",
        "fuzziness": "AUTO"
      }
    }
  }
}

弹性停止分析器和模糊搜索问题

Elastic stop analyzer and fuzzy search issue

amazon-web-services

opensearch

elasticsearch

elasticsearch-analyzers