Elasticsearch 本身会忽略查询字符串中的特殊字符。我怎样才能避免这种情况？

Question

我必须搜索一些具有特殊字符的特定 query_string，但它给了我所有的结果。

当我分析它时：

GET /exact/_analyze
{
  "field": "subject",
  "text": "abdominal-scan"
}

输出如下：

{
  "tokens": [
    {
      "token": "abdominal",
      "start_offset": 0,
      "end_offset": 9,
      "type": "<ALPHANUM>",
      "position": 0
    },
    {
      "token": "scan",
      "start_offset": 10,
      "end_offset": 14,
      "type": "<ALPHANUM>",
      "position": 1
    }
  ]
}

这意味着自动跳过连字符 (-) 并将这些词视为 2 个词。

如果我将字段的索引更改为 not_analyzed，那么我将无法在该字段中搜索单个单词，整个句子必须在 query_string 中传递。

是否有任何其他选择，以便我可以进行精确搜索（不忽略特殊字符）？

Answer 1

您应该查看 definitive guide's section about analysis，因为这对于了解索引的行为非常重要。

默认情况下，您的字段是使用 standard analyzer 分析的，它在连字符上拆分单词。

一个非常容易理解的分析器是 whitespace analyzer，它将输入拆分为空白字符上的标记。

你可以试试这个例子：

POST /solution
{
  "mappings":{
    "doc": {
      "properties":{
        "subject": {
          "type": "string",
          "analyzer": "whitespace"
        }
      }
    }
  }
}

GET /solution/_analyze
{
  "field": "subject",
  "text": "This is about abdominal-scan"
}

输出：

{
  "tokens": [
    {
      "token": "This",
      "start_offset": 0,
      "end_offset": 4,
      "type": "word",
      "position": 0
    },
    {
      "token": "is",
      "start_offset": 5,
      "end_offset": 7,
      "type": "word",
      "position": 1
    },
    {
      "token": "about",
      "start_offset": 8,
      "end_offset": 13,
      "type": "word",
      "position": 2
    },
    {
      "token": "abdominal-scan",
      "start_offset": 14,
      "end_offset": 28,
      "type": "word",
      "position": 3
    }
  ]
}

您可以看到您的连字符在这种情况下得到了保留。

Elasticsearch 本身会忽略查询字符串中的特殊字符。我怎样才能避免这种情况？

Elasticsearch itself ignores the special characters in query string. How can i avoid that?

special-characters

exact-match

elasticsearch