Elasticsearch 不区分大小写的带空格单词的通配符搜索

Question

字段 priorityName 是 search_as_you_type 数据类型。

我的用例就像我想用以下词搜索文档：---

"let's" -> 应该给出两个结果
"DOING" -> 应该给出两个结果
"are you" -> 应该给出两个结果
"Are You" -> 应该给出两个结果
"you do"（你做不到）-> 应该给出两个结果
"re you" -> 应该给出两个结果

在 6 个中，只有前 5 个使用 multi_match 给出了我想要的结果。我怎么能有第 6 个用例，我们可以有不以第一个字符开头的不完整单词。

示例文档

        "_index": "priority",
        "_type": "_doc",
        "_id": "vaCI_HAB31AaC-t5TO9H",
        "_score": 1,
        "_source": { - 
          "priorityName": "What are you doing along Let's Go out"
        }
      },
      { - 
        "_index": "priority",
        "_type": "_doc",
        "_id": "vqCQ_HAB31AaC-t5wO8m",
        "_score": 1,
        "_source": { - 
          "priorityName": "what are you doing along let's go for shopping"
        }
      }
    ]
  }

Answer 1

对于上次搜索 re you，您需要 infix tokens，默认情况下它不包含在 search_as_you_type 数据类型中。我建议您创建一个自定义分析器，它将创建中缀标记并允许您匹配所有 6 个查询。

我已经创建了一个自定义分析器并使用您的示例文档对其进行了测试，所有 6 个查询都给出了两个示例结果。

索引映射

POST /中缀索引

{
    "settings": {
        "max_ngram_diff": 50,
        "analysis": {
            "filter": {
                "autocomplete_filter": {
                    "type": "ngram",
                    "min_gram": 1,
                    "max_gram": 8
                }
            },
            "analyzer": {
                "autocomplete_analyzer": {
                    "type": "custom",
                    "tokenizer": "whitespace",
                    "filter": [
                        "lowercase",
                        "autocomplete_filter"
                    ]
                },
                "lowercase_analyzer": {
                    "type": "custom",
                    "tokenizer": "whitespace",
                    "filter": [
                        "lowercase"
                    ]
                }
            }
        }
    },
    "mappings": {
        "properties": {
            "priorityName": {
                "type": "text",
                "analyzer": "autocomplete_analyzer",
                "search_analyzer": "standard" --> note this
            }
        }
    }
}

索引您的示例文档

{
  "priorityName" : "What are you doing along Let's Go out"
}

{
  "priorityName" : "what are you doing along let's go for shopping"
}

最后一个 `re you`

的搜索查询

{
    "query": {
        "match" : {
            "priorityName" : "re you"
        }
    }
}

结果

"hits": [
      {
        "_index": "ngram",
        "_type": "_doc",
        "_id": "1",
        "_score": 1.4652853,
        "_source": {
          "priorityName": "What are you doing along Let's Go out"
        }
      },
      {
        "_index": "ngram",
        "_type": "_doc",
        "_id": "2",
        "_score": 1.4509768,
        "_source": {
          "priorityName": "what are you doing along let's go for shopping"
        }
      }

其他查询也向我返回了两个文档，但不包括它们以缩短此答案的长度。

注意：下面是一些重要的链接，可以详细了解答案。

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-analyzer.html

https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-ngram-tokenizer.html

Elasticsearch 不区分大小写的带空格单词的通配符搜索

Elasticsearch case insesitive wildcard search with spaced words

space

case-insensitive

elasticsearch

示例文档

索引映射

索引您的示例文档

最后一个 `re you`

Elasticsearch 不区分大小写的带空格单词的通配符搜索

Elasticsearch case insesitive wildcard search with spaced words

space

case-insensitive

elasticsearch

示例文档

索引映射

索引您的示例文档

最后一个 re you

最后一个 `re you`