我可以在 Elasticsearch 中结合使用通配符和全文搜索吗？

Question

比如我在Elasticsearch中有一些标题数据是这样的，
gamexxx_nightmare,
gamexxx_little_guy

然后我输入
game => 搜索 gamexxx_nightmare 和 gamexxx_little_guy
little guy => 搜索 gamexxx_little_guy ?

首先我想我会使用通配符使game匹配gamexxx，其次是全文搜索？如何将它们组合在一个 DSL 中？？

Answer 1

NGrams 比通配符有更好的性能。对于通配符，必须扫描所有文件以查看哪些文件与模式匹配。 Ngrams 将文本分解为小标记。 Ex Quick Foxes 将存储为 [ Qui, uic, ick, Fox, oxe, xes ] 取决于 min_gram 和 max_gram 大小。

PUT my_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "my_tokenizer"
        }
      },
      "tokenizer": {
        "my_tokenizer": {
          "type": "ngram",
          "min_gram": 3,
          "max_gram": 3,
          "token_chars": [
            "letter",
            "digit"
          ]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "text":{
        "type": "text",
        "analyzer": "my_analyzer"
      }
    }
  }
}

查询

GET my_index/_search
{
  "query": {
    "match": {
      "text": "little guy"
    }
  }
}

如果您只想使用通配符，则可以搜索 not_analyzed 字符串。这将处理单词之间的空格

"wildcard": {
      "text.keyword": {
        "value": "*gamexxx*"
      }
}

Answer 2

虽然 Jaspreet 的回答是正确的，但没有像 OP 在他的问题 How to combin them in one DSL??.

这是对 Jaspreet 解决方案的增强，因为我也没有使用通配符，甚至避免使用成本太高（增加索引大小）并且在需求更改时需要重新索引的 n-gram 分析器。

结合这两个要求的一个搜索查询可以按如下方式完成：

索引映射

{
    "settings": {
        "analysis": {
            "analyzer": {
                "my_analyzer": {
                    "tokenizer": "standard",
                    "char_filter": [
                        "replace_underscore" -->note this
                    ]
                }
            },
            "char_filter": {
                "replace_underscore": {
                    "type": "mapping",
                    "mappings": [
                        "_ => \u0020"
                    ]
                }
            }
        }
    },
    "mappings": {
        "properties": {
            "title": {
                "type": "text",
                "analyzer" : "my_analyzer"
            }
        }
    }
}

索引您的示例文档

{
   "title" : "gamexxx_little_guy"
}

And

{
   "title" : "gamexxx_nightmare"
}

单个搜索查询

{
    "query": {
        "bool": {
            "must": [ --> note this
                {
                    "bool": {
                        "must": [
                            {
                                "prefix": {
                                    "title": {
                                        "value": "game"
                                    }
                                }
                            }
                        ]
                    }
                },
                {
                    "bool": {
                        "must": [
                            {
                                "match": {
                                    "title": {
                                        "query": "little guy"
                                    }
                                }
                            }
                        ]
                    }
                }
            ]
        }
    }
}

结果

 {
        "_index": "so-46873023",
        "_type": "_doc",
        "_id": "2",
        "_score": 2.2814486,
        "_source": {
           "title": "gamexxx_little_guy"
        }
     }

要点：

查询的第一部分是prefix查询，它将匹配两个文档中的game。（这将避免昂贵的正则表达式）。
第二部分允许全文搜索，为了实现这一点，我使用自定义分析器将 _ 替换为空格，因此您不需要昂贵的（索引中的 n-gram）和简单的匹配查询将获取结果。
上面的查询，return的结果同时满足两个条件，你可以把high level，bool子句从must改成should if，你想return 匹配任何标准。

我可以在 Elasticsearch 中结合使用通配符和全文搜索吗？

Could I combine wildcard and fulltext search in Elasticsearch?

wildcard

elasticsearch

elasticsearch-dsl

elasticsearch-query

索引映射

索引您的示例文档

单个搜索查询

结果