使用 ngram 在文本中搜索搜索模式及以上的最小字符

Question

我的弹性服务器中有一个文本索引。我已经实现了一个像这样的 ngram 分词器：

"analysis": {
      "analyzer": {
        "ngram_analyzer": {
          "type": "custom",
          "tokenizer": "ngram_tokenizer"
        }
      },
      "tokenizer": {
        "ngram_tokenizer": {
          "type": "ngram",
          "min_gram": "3",
          "max_gram": "7"
        }
      }
    },

假设我的数据是

"Hello beautiful world ell"

当我进行查询匹配时 "Hell" 我希望它只找到我的第一个词 (Hello) 而不是 ell 这个词，所以基本上我不希望它 "break" 我的搜索模式只是为了在我的数据中按原样找到它（有 4 个字符而不是下）

谢谢

Answer 1

解决方案是更改分析器中的分词器。

例如，您可以这样做

"some_analyzer": {
    "type": "custom",
    "tokenizer": "whitespace",
    "filter": [ "lowercase" ]
  }

重要的是您的搜索分析器没有 nGram 分词器。

使用 ngram 在文本中搜索搜索模式及以上的最小字符

Searching within a text using ngram for the minimum chars of the search pattern and above

n-gram

elasticsearch

elasticsearch-net

elasticsearch-5