Elasticsearch 仅显示具有特殊字符的匹配项 .raw

Question

几天前我开始使用 Elasticsearch，我创建了一些分析器和映射，并成功地向其中插入了一些数据。当我尝试查询其中包含一些特殊字符的数据时出现问题。最初我使用的是 standard 分析器，但在阅读了更多选项后，我选择了 whitespace，因为它也可以标记特殊字符。但是，我仍然无法查询数据。但是，如果我输入 field.raw（其中字段是对象的实际属性），我会得到我需要的结果。但是，.raw 绕过了分析器，我想知道它是否会破坏这一切的目的。由于空格对我不起作用，我恢复到 standard 那个。

这是我构建的分析器：

PUT demoindex
{
  "settings": {
    "analysis": {
      "filter": {
        "ngram": {
          "type": "edge_ngram",
          "min_gram": 1,
          "max_gram": 20,
          "token_chars": [
            "letter",
            "digit"
          ]
        },
        "splcharfilter": {
          "type": "pattern_capture",
          "preserve_original": true,
          "patterns": [
            "([?/-])"
          ]
        }
      },
      "analyzer": {
        "my_field_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "ngram",
            "splcharfilter"
          ]
        }
      }
    }
  }
}

我建的地图：

PUT demoindex/_mapping
{
  "properties": {
    "name": {
      "type": "text",
      "analyzer": "my_field_analyzer",
      "search_analyzer": "simple",
      "fields": {
        "raw": {
          "type": "keyword"
        }
      }
    },
    "area": {
      "type": "text",
      "analyzer": "my_field_analyzer",
      "search_analyzer": "simple",
      "fields": {
        "raw": {
          "type": "keyword"
        }
      }
    }
  }
}

无效的查询：

GET /demoindex/_search?pretty
{
  "from": 0,
  "query": {
    "bool": {
      "filter": [
        {
          "term": {
            "area": {
              "value": "is - application"
            }
          }
        },
        {
          "term": {
            "name": {
              "value": "hem"
            }
          }
        }
      ]
    }
  },
  "size": 15
}

有效的查询：

GET /demoindex/_search?pretty
{
  "from": 0,
  "query": {
    "bool": {
      "filter": [
        {
          "term": {
            "area.raw": {
              "value": "is - application"
            }
          }
        },
        {
          "term": {
            "name": {
              "value": "hem"
            }
          }
        }
      ]
    }
  },
  "size": 15
}

如您所见，我不得不使用 area.raw 来匹配内容和 return 文档。由于名称不应该有任何特殊字符，没有 .raw 应该没问题，但其他字段将有特殊字符，可能不限于 -.

所以，有人可以指出我做错了什么或者我解释错了什么吗？或者有更好的方法来实现这个目标吗？

P.S: 版本信息

弹性搜索：7.10.1

Lucene：8.7.0

Answer 1

不分析关键字字段。
文本字段已分析。

要检查这些是如何分析的以及生成的所有令牌，您可以使用 Elasticsearch 中的“分析 API”。

你的情况：

POST demoindex/_analyze
{
  "text": ["is - application"],
  "field": "area"
}

它会输出

{
  "tokens" : [
    {
      "token" : "i"
    },
    {
      "token" : "is"
    },
    {
      "token" : "a"
    },
    {
      "token" : "ap"
    },
    {
      "token" : "app"
    },
    {
      "token" : "appl"
    },
    {
      "token" : "appli"
    },
    {
      "token" : "applic"
    },
    {
      "token" : "applica"
    },
    {
      "token" : "applicat"
    },
    {
      "token" : "applicati"
    },
    {
      "token" : "applicatio"
    },
    {
      "token" : "application"
    }
  ]
}

因此，当您提供值 area.raw:"is - application" 作为关键字类型时，它将按原样保存，因此您的以下术语查询有效。

Term queries are used for exact matching and should be used with field which are not analyzed like area.raw which is keyword in your case.

GET /demoindex/_search?pretty
{
  "from": 0,
  "query": {
    "bool": {
      "filter": [
        {
          "term": {
            "area.raw": {
              "value": "is - application"
            }
          }
        }
      ]
    }
  },
  "size": 15
}

但是当您在文本字段上应用相同的 Term 查询时，它不起作用，因为它试图与提供的值完全匹配，但正如我们在上面看到的那样，区域值已被标记化，

因此，正如 Elasticsearch 所建议的那样，用户“匹配”查询文本（分析字段）总是更好。所以belwo查询会产生相同的结果

GET /demoindex/_search?pretty
{
  "from": 0,
  "query": {
    "bool": {
      "filter": [
        {
          "match": {
            "area": {
              "query": "is - application"
            }
          }
        }
      ]
    }
  },
  "size": 15
}

Elasticsearch 仅显示具有特殊字符的匹配项 .raw

Elasticsearch shows match with special character with only .raw

c#

lucene

elasticsearch

kibana