如何生成 multi-word 个搜索建议

Question

我正在使用 Elasticsearch 构建一个小型搜索应用程序，并试图找出如何使用 multi-word（短语）建议构建自动完成功能。我让它工作...有点...

我得到的大多是单个单词的建议，但是当我点击 space 栏时 - 它会取消建议。

例如，如果我输入 "fast" 它工作正常，如果我输入 "fast " - 将停止显示建议。

我正在使用 Edge N Grams 和 match_phrase_prefix 并按照示例 here and here 进行构建。对于 match_phrase_prefix 中的 _all 字段，仅使用 include_in_all: false 取消除标题和内容之外的所有字段。我开始认为这只是因为我在一个小数据集上进行测试，而且根本没有足够的标记化术语来产生 multi-word 建议。请看下面的相关代码，如果有错误，请告诉我哪里出错了？

"analysis": {
"filter": {
 "autocomplete_filter": {
  "type": "edge_ngram",
  "min_gram": "1",
  "max_gram": "20",
  "token_chars": [
    "letter",
    "digit"
  ]
 }
},
"analyzer": {
  "autocomplete": {
    "type": "custom",
    "tokenizer": "whitespace",
    "filter": [
       "lowercase",
       "asciifolding",
       "autocomplete_filter"
    ]     
  },
  "whitespace_analyzer": {
    "type": "custom",
    "tokenizer": "whitespace",
    "filter": [
      "lowercase",
      "asciifolding"
      ]

Answer 1

尝试keyword分词器

"autocomplete": {
    "type": "custom",
           "filter": [
       "lowercase",
       "asciifolding",
       "autocomplete_filter"
    ],
 "tokenizer": "keyword"     
  }

供参考 elasticsearch mapping tokenizer keyword to avoid splitting tokens and enable use of wildcard

因为默认情况下它的标准分析器按空格拆分您可以检查您的令牌，例如 curl 'localhost:9200/test/_analyze?pretty=1&analyzer=my_edge_ngram_analyzer' -d 'FC Schalke 04' 参考 https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-edgengram-tokenizer.html

如何生成 multi-word 个搜索建议

How to generate multi-word search suggestions

elasticsearch

match-phrase