无法从基于查询和文档标记化的弹性搜索中获得正确的结果

Question

我正在尝试实现一个搜索系统，我需要在其中使用 Edge NGRAM Tokenizer。创建索引的设置如下所示。我对文档和搜索查询使用了相同的分词器。（文件为波斯语）

PUT /test
{
  "settings": {
    "analysis": {
      "analyzer": {
        "autocomplete": {
          "tokenizer": "autocomplete",
          "filter": [
            "lowercase"
          ]
        },
        "autocomplete_search": {
          "tokenizer": "autocomplete"
        }
      },
      "tokenizer": {
        "autocomplete": {
          "type": "edge-ngram",
          "min_gram": 2,
          "max_gram": 10,
          "token_chars": [
            "letter"
          ]
        }
      }
    }
  },
  "mappings": {
    "_doc": {
      "properties": {
        "title": {
          "type": "text",
          "analyzer": "autocomplete",
          "search_analyzer": "autocomplete_search"
        }
      }
    }
  }
}

当我在文档中搜索术语“آلمانو”得到 0 个匹配（结果）时出现问题，而我有一个包含数据的文档：“آلمان خوب است”。

如您所见，分析术语“آلمانی”的结果表明它生成了令牌“آلمان”并且工作正常。

{
  "tokens" : [
    {
      "token" : "آ",
      "start_offset" : 0,
      "end_offset" : 6,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "آل",
      "start_offset" : 0,
      "end_offset" : 6,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "آلم",
      "start_offset" : 0,
      "end_offset" : 6,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "آلما",
      "start_offset" : 0,
      "end_offset" : 6,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "آلمان",
      "start_offset" : 0,
      "end_offset" : 6,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "آلمانی",
      "start_offset" : 0,
      "end_offset" : 6,
      "type" : "<ALPHANUM>",
      "position" : 0
    }
  ]
}

下面显示的搜索查询获得 0 个匹配。

GET /test/_search
{
  "query": {"match": {
    "title": {"query": "آلمانی" , "operator": "and"}
  }}
}

但是搜索词“آلما”returns 文档中包含数据“آلمان خوب است”。我该如何解决这个问题？

非常感谢您的帮助。

Answer 1

我找到了 Ricardo Heck 的 DevTicks post，它解决了我的问题。 enter the link for more detailed description

我这样更改了映射设置：

    "mappings": {
    "_doc": {
      "properties": {
        "title": {
          "type": "text",
          "analyzer": "autocomplete",
          "search_analyzer": "autocomplete_search",
          "fields": {
            "ngram": {
              "type": "text",
              "analyzer": "autocomplete"
            }
          }
        }
      }
    }
  }

现在我通过搜索术语“آلمانی”获得了文档“آلمان خوب است”。

无法从基于查询和文档标记化的弹性搜索中获得正确的结果

Can't get proper result from elasticsearch based on query and document tokenization

search

tokenize

n-gram

elasticsearch

elasticsearch-analyzers