Elasticsearch 通配符、正则表达式、match_phrase、前缀查询返回错误结果

Question

我刚刚开始使用 Elasticsearch，版本 7.5.1。

我想查询以特定单词片段开头的结果。例如 tho* 应该 return 数据包含：

thought, Thomson, those, etc.

我试过 -

正则表达式

[{'regexp':{'f1':'tho.*'}},{'regexp':{'f2':'tho.*'}}]

通配符

[{'wildcard':{'f1':'tho*'}},{'wildcard':{'f2':'tho*'}}]

前缀

[{'prefix':{'f1':'tho'}},{'prefix':{'f2':'tho'}}]

match_phrase

'multi_match': {'query': 'tho', 'fields':[f1,f2,f3], 'type':phrase}
# also tried with type phrase_prefix

所有这些都是 return 正确的结果，但它们也都 return 单词方法。

类似地，cat* 是 returning 单词 communication。

我做错了什么？这与 analyzer 相关吗？

编辑 - 这是字段映射 -

'f1': {
                'full_name': 'f1',
                'mapping': {
                    'f1': {
                        'type': 'text',
                        'analyzer': 'some_analyzer',
                        'index_phrases': true
                    }
                }
            },

Answer 1

由于您没有提供您的任何索引映射，并且如前所述，您也在搜索结果中得到 method。我认为您设置的分析器有问题。

一种可能是您设置了 ngram tokenizer，将单词标记化，并生成 tho 的标记（因为所有单词都具有 tho他们)

添加包含索引数据、映射、搜索查询和搜索结果的工作示例

索引映射：

{
  "mappings": {
    "properties": {
      "f1": {
        "type": "text"
      }
    }
  }
}

索引数据：

{
  "f1": "method"
}
{
  "f1": "thought"
}
{
  "f1": "Thomson"
}
{
  "f1": "those"
}

使用通配符查询的搜索查询：

{
  "query": {
    "wildcard": {
      "f1": {
        "value": "tho*"
      }
    }
  }
}

使用前缀查询的搜索查询：

{
  "query": {
    "prefix": {
      "f1": {
        "value": "tho"
      }
    }
  }
}

使用正则表达式查询的搜索查询：

{
  "query": {
    "regexp": {
      "f1": {
        "value": "tho.*"
      }
    }
  }
}

使用匹配短语前缀查询搜索查询：

{
  "query": {
    "match_phrase_prefix": {
      "f1": {
        "query": "tho"
      }
    }
  }
}

以上4个查询的搜索结果都是

"hits": [
      {
        "_index": "67673694",
        "_type": "_doc",
        "_id": "1",
        "_score": 1.2039728,
        "_source": {
          "f1": "thought"
        }
      },
      {
        "_index": "67673694",
        "_type": "_doc",
        "_id": "2",
        "_score": 1.2039728,
        "_source": {
          "f1": "Thomson"
        }
      },
      {
        "_index": "67673694",
        "_type": "_doc",
        "_id": "3",
        "_score": 1.2039728,
        "_source": {
          "f1": "those"
        }
      }
    ]

Elasticsearch 通配符、正则表达式、match_phrase、前缀查询返回错误结果

Elasticsearch wildcard, regexp, match_phrase, prefix query returning wrong results

regex

wildcard

analyzer

elasticsearch

match-phrase